tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates how well the solution uses PyMC3 to manually block step methods and configure proposals for a simple Poisson switch model. Focuses on explicit step selection, proposal mapping, deterministic sampling, and extracting acceptance diagnostics tied to block assignments.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Model wiring",
      "description": "Defines the Poisson switch model in a pm.Model context with a discrete switch (e.g., pm.Bernoulli or pm.Categorical) controlling the observed rate via pm.math.switch or equivalent, and continuous rate variables tracked for sampling.",
      "max_score": 25
    },
    {
      "name": "Manual blocking",
      "description": "Constructs explicit step methods and combines them (pm.CompoundStep or list) so the continuous rates share a single gradient-based step such as pm.NUTS or pm.HamiltonianMC while the switch variable is isolated in its own step; passes this custom step structure to pm.sample instead of relying on automatic assignment.",
      "max_score": 25
    },
    {
      "name": "Proposal selection",
      "description": "Maps proposal strings to concrete proposal distributions (pm.CauchyProposal for heavy-tailed, pm.NormalProposal for gaussian) and attaches them to the discrete step via pm.Metropolis/pm.BinaryMetropolis proposal_dist, matching the caller's choice.",
      "max_score": 20
    },
    {
      "name": "Deterministic runs",
      "description": "Propagates the seed to PyMC3 sampling (pm.sample random_seed or equivalent plus NumPy seeding) so repeated calls with the same seed yield identical posterior means for rate_a and rate_b after tune is discarded.",
      "max_score": 15
    },
    {
      "name": "Acceptance reporting",
      "description": "Derives per-block acceptance rates from PyMC3 sampler stats (e.g., trace.get_sampler_stats on each step) and returns them keyed by block names aligned with the manual block assignments.",
      "max_score": 15
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-3/

rubric.jsonevals/scenario-3/