tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates whether the solution leverages PyMC's log-probability utilities on PyTensor graphs to deliver log densities, log CDFs, and quantiles for a mean/scale continuous variable without resorting to manual probability math.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Graph distribution",
      "description": "Defines the underlying variable as a PyMC distribution node (e.g., pm.Normal) parameterized by mean and sigma instead of coding density formulas by hand.",
      "max_score": 20
    },
    {
      "name": "Log density via logp",
      "description": "Uses pm.logp (or pm.logprob) on the distribution to produce elementwise log densities that respect input shape, avoiding raw NumPy math or ad hoc loops.",
      "max_score": 25
    },
    {
      "name": "Tail probability",
      "description": "Computes log CDF values using pm.logcdf on the same distribution rather than manual CDF calculations or approximate lookups.",
      "max_score": 20
    },
    {
      "name": "Quantile via icdf",
      "description": "Obtains quantiles through pm.icdf (or the distribution's icdf) driven by input probabilities, not hardcoded inverse-CDF constants or numerical solvers.",
      "max_score": 20
    },
    {
      "name": "Tensor outputs",
      "description": "Returns PyTensor TensorVariables and relies on pm.math/pt operations so results stay symbolic, broadcast with array inputs, and compose in larger graphs.",
      "max_score": 15
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

rubric.jsonevals/scenario-1/