tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates whether the solution uses PyMC's model context plus prior and posterior predictive utilities to build a simple count model, simulate draws reproducibly, and summarize predictive outputs as required by the spec.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Model structure",
      "description": "Creates a pm.Model with coords/dims for the day axis; defines a continuous log_rate prior (e.g., pm.Normal) and a count likelihood (e.g., pm.Poisson) for observed_counts using the day dimension.",
      "max_score": 20
    },
    {
      "name": "Prior predictive",
      "description": "Calls pm.sample_prior_predictive with the requested draws and random_seed from within the model context, requesting var_names that include log_rate and observed_counts and returning both with the expected shapes.",
      "max_score": 20
    },
    {
      "name": "Posterior sampling",
      "description": "Runs pm.sample (or pm.draw for deterministic transforms) with the supplied draws/tune and random_seed to produce a posterior trace suitable for predictive use, rather than manually simulating or bypassing PyMC inference.",
      "max_score": 20
    },
    {
      "name": "Posterior predictive",
      "description": "Invokes pm.sample_posterior_predictive using the posterior trace and model, correctly passing var_names and new data (via pm.Data or set_data) to produce observed_counts and forecast_counts draws with the specified shapes.",
      "max_score": 25
    },
    {
      "name": "Reproducibility & summaries",
      "description": "Controls randomness with PyMC random_seed handling (or numpy RNG passed through) so prior/posterior draws are repeatable, and computes medians directly from PyMC/ArviZ outputs (e.g., az.extract, pm.draw) instead of ad-hoc randomness.",
      "max_score": 15
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

rubric.jsonevals/scenario-2/