tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates how well the solution uses PyMC's variational inference APIs to fit a logistic regression with both mean-field and full-rank approximations, track ELBO progress, control randomness, and generate posterior predictive probabilities for held-out points.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Modeling with PyMC",
      "description": "Defines the logistic regression inside a pm.Model context using pm.Normal (or similar) priors for intercept and slope, links the linear predictor through pm.math.sigmoid/pm.math.invlogit into a pm.Bernoulli likelihood, and binds X_train/y_train via shared data constructs.",
      "max_score": 20
    },
    {
      "name": "Mean-field VI",
      "description": "Runs a mean-field variational fit via pm.fit with a pm.ADVI (or equivalent mean-field) method for vi_steps iterations, captures its ELBO history (e.g., approx.hist or fit.elbo_vals) into mean_field_elbo, and keeps the resulting Approximation object separate from other fits.",
      "max_score": 20
    },
    {
      "name": "Full-rank VI",
      "description": "Runs a dense-covariance variational fit via pm.fit with pm.FullRankADVI (or equivalent full-rank) for the same model and vi_steps, records its ELBO history into full_rank_elbo, and preserves the distinct Approximation for later comparison.",
      "max_score": 20
    },
    {
      "name": "Seed control",
      "description": "Applies the random_seed argument to pm.fit (and to approx.sample or pm.sample_posterior_predictive) so repeated calls with the same seed yield identical ELBO traces and predictive probabilities.",
      "max_score": 15
    },
    {
      "name": "Posterior predictive",
      "description": "Compares the final ELBOs, selects the better-performing Approximation, draws posterior samples with approx.sample(draws) and uses pm.sample_posterior_predictive (or equivalent model.eval) to produce test_probabilities matching X_test order with separation (<0.2 for negative input, >0.8 for positive).",
      "max_score": 25
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-7/

rubric.jsonevals/scenario-7/