Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
Agent Success
Agent success rate when using this tile
68%
Improvement
Agent success rate improvement when using this tile compared to baseline
0.94x
Baseline
Agent success rate without this tile
72%
{
"context": "Evaluates whether the solution uses PyMC's causal intervention/observation utilities and graph exporters to build and query the marketing model. Focuses on correct application of PyMC APIs for posterior sampling, do/observe scenarios, predictive draws, and graph outputs.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Model structure",
"description": "Builds the causal graph in a pm.Model with nodes for organic_visits -> campaign_spend -> signups (plus organic_visits -> signups), using PyMC random variables and deterministic nodes rather than manual math.",
"max_score": 15
},
{
"name": "Posterior fit",
"description": "Uses pm.sample (or equivalent PyMC posterior sampler) to fit the baseline model and returns the trace for reuse instead of re-sampling per call.",
"max_score": 10
},
{
"name": "Do intervention",
"description": "Implements campaign_spend interventions with pm.do or the do-transform utilities so predictive draws ignore observed spend and clamp to the provided value.",
"max_score": 25
},
{
"name": "Observation update",
"description": "Implements new organic_visits conditioning with pm.observe/pm.set_data (or pm.observe rewrites) so the posterior predictive reflects the provided observation without altering other edges.",
"max_score": 20
},
{
"name": "Predictive draws",
"description": "Generates posterior predictive samples for observational, interventional, and observed scenarios using pm.sample_posterior_predictive (or pm.draw/pm.fast_sample_posterior_predictive) with requested draw counts.",
"max_score": 10
},
{
"name": "Graph exports",
"description": "Produces Graphviz DOT and Mermaid text using PyMC graph exporters (e.g., pm.model_to_graphviz, pm.model_to_mermaid, or pm.model_to_networkx) and ensures edges and node labels match the causal structure.",
"max_score": 15
},
{
"name": "Seed control",
"description": "Threads the random_seed argument into pm.sample and predictive utilities so repeated runs with the same seed yield consistent outputs.",
"max_score": 5
}
]
}docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10