tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates how well the solution uses PyMC's labeled dimension tooling for store/week data, including shared data containers that can be updated and minibatching without losing coordinate information. Scoring checks for correct use of coords/dims, mutable data updates, and minibatch scaling so inference matches full-data intent.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Coords Model",
      "description": "Defines the model with pm.Model(coords=...) and applies dims referencing store/week coordinates on all stochastic and observed variables so posterior groups keep those labels.",
      "max_score": 25
    },
    {
      "name": "Shared Data",
      "description": "Binds sales, price, and promo arrays through pm.Data/pm.MutableData with dims tied to coords instead of bare tensors.",
      "max_score": 20
    },
    {
      "name": "Data Updates",
      "description": "Updates observed inputs via pm.set_data or Data.set_value without rebuilding the model while preserving coordinate alignment.",
      "max_score": 15
    },
    {
      "name": "Minibatch Inputs",
      "description": "Uses pm.Minibatch for observed/features with total_size set to full dataset and dims consistent with labeled coords.",
      "max_score": 20
    },
    {
      "name": "Loglik Scaling",
      "description": "Ensures minibatch log-likelihood is scaled to represent the full dataset (e.g., via pm.Minibatch total_size or manual weighting).",
      "max_score": 10
    },
    {
      "name": "Posterior Labels",
      "description": "Posterior and posterior_predictive outputs retain store/week coordinate names, showing dims propagated through sampling and prediction.",
      "max_score": 10
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-4/

rubric.jsonevals/scenario-4/