Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
Agent Success
Agent success rate when using this tile
68%
Improvement
Agent success rate improvement when using this tile compared to baseline
0.94x
Baseline
Agent success rate without this tile
72%
{
"context": "Evaluates whether the solution uses PyMC's model context plus prior and posterior predictive utilities to build a simple count model, simulate draws reproducibly, and summarize predictive outputs as required by the spec.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Model structure",
"description": "Creates a pm.Model with coords/dims for the day axis; defines a continuous log_rate prior (e.g., pm.Normal) and a count likelihood (e.g., pm.Poisson) for observed_counts using the day dimension.",
"max_score": 20
},
{
"name": "Prior predictive",
"description": "Calls pm.sample_prior_predictive with the requested draws and random_seed from within the model context, requesting var_names that include log_rate and observed_counts and returning both with the expected shapes.",
"max_score": 20
},
{
"name": "Posterior sampling",
"description": "Runs pm.sample (or pm.draw for deterministic transforms) with the supplied draws/tune and random_seed to produce a posterior trace suitable for predictive use, rather than manually simulating or bypassing PyMC inference.",
"max_score": 20
},
{
"name": "Posterior predictive",
"description": "Invokes pm.sample_posterior_predictive using the posterior trace and model, correctly passing var_names and new data (via pm.Data or set_data) to produce observed_counts and forecast_counts draws with the specified shapes.",
"max_score": 25
},
{
"name": "Reproducibility & summaries",
"description": "Controls randomness with PyMC random_seed handling (or numpy RNG passed through) so prior/posterior draws are repeatable, and computes medians directly from PyMC/ArviZ outputs (e.g., az.extract, pm.draw) instead of ad-hoc randomness.",
"max_score": 15
}
]
}docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10