tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates how PyMC is used to build a simple coin-bias model and rely on its automatic MCMC step assignment to generate reproducible posterior summaries. Checks focus on correct use of PyMC modeling primitives, sampling defaults, and summary helpers rather than general coding style.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Model setup",
      "description": "Builds the coin model inside a pm.Model using a pm.Beta(2,2) prior on the bias and a pm.Bernoulli or pm.Binomial likelihood that consumes the provided head/tail counts.",
      "max_score": 20
    },
    {
      "name": "Auto sampler",
      "description": "Invokes pm.sample with the draws and tune arguments while leaving step methods unspecified so PyMC performs automatic selection (e.g., NUTS for the continuous bias) instead of manually constructing samplers.",
      "max_score": 25
    },
    {
      "name": "Seed reuse",
      "description": "Passes the random_seed argument through to pm.sample (or an equivalent rng_seeder) so repeated runs with the same seed reproduce posterior statistics and sampler initialization.",
      "max_score": 15
    },
    {
      "name": "Posterior array",
      "description": "Extracts the bias samples from the pm.sample InferenceData (e.g., via idata.posterior) and returns them as a 1D numpy array with at least the requested number of draws after burn-in.",
      "max_score": 20
    },
    {
      "name": "HDI via package",
      "description": "Computes the 94% highest density interval using package tooling such as pm.stats.hdi or arviz.hdi on the PyMC posterior draws and includes it in the returned dictionary.",
      "max_score": 20
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-6/

rubric.jsonevals/scenario-6/