Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
Agent Success
Agent success rate when using this tile
68%
Improvement
Agent success rate improvement when using this tile compared to baseline
0.94x
Baseline
Agent success rate without this tile
72%
{
"context": "Evaluates how well the solution uses PyMC3 to manually block step methods and configure proposals for a simple Poisson switch model. Focuses on explicit step selection, proposal mapping, deterministic sampling, and extracting acceptance diagnostics tied to block assignments.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Model wiring",
"description": "Defines the Poisson switch model in a pm.Model context with a discrete switch (e.g., pm.Bernoulli or pm.Categorical) controlling the observed rate via pm.math.switch or equivalent, and continuous rate variables tracked for sampling.",
"max_score": 25
},
{
"name": "Manual blocking",
"description": "Constructs explicit step methods and combines them (pm.CompoundStep or list) so the continuous rates share a single gradient-based step such as pm.NUTS or pm.HamiltonianMC while the switch variable is isolated in its own step; passes this custom step structure to pm.sample instead of relying on automatic assignment.",
"max_score": 25
},
{
"name": "Proposal selection",
"description": "Maps proposal strings to concrete proposal distributions (pm.CauchyProposal for heavy-tailed, pm.NormalProposal for gaussian) and attaches them to the discrete step via pm.Metropolis/pm.BinaryMetropolis proposal_dist, matching the caller's choice.",
"max_score": 20
},
{
"name": "Deterministic runs",
"description": "Propagates the seed to PyMC3 sampling (pm.sample random_seed or equivalent plus NumPy seeding) so repeated calls with the same seed yield identical posterior means for rate_a and rate_b after tune is discarded.",
"max_score": 15
},
{
"name": "Acceptance reporting",
"description": "Derives per-block acceptance rates from PyMC3 sampler stats (e.g., trace.get_sampler_stats on each step) and returns them keyed by block names aligned with the manual block assignments.",
"max_score": 15
}
]
}docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10