tessl/pypi-pymc3

Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

Agent Success

Agent success rate when using this tile

68%

Improvement

Agent success rate improvement when using this tile compared to baseline

0.94x

Baseline

Agent success rate without this tile

72%

Overview

Eval results

Files

{
  "context": "Evaluates whether the solution uses PyMC's causal intervention/observation utilities and graph exporters to build and query the marketing model. Focuses on correct application of PyMC APIs for posterior sampling, do/observe scenarios, predictive draws, and graph outputs.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Model structure",
      "description": "Builds the causal graph in a pm.Model with nodes for organic_visits -> campaign_spend -> signups (plus organic_visits -> signups), using PyMC random variables and deterministic nodes rather than manual math.",
      "max_score": 15
    },
    {
      "name": "Posterior fit",
      "description": "Uses pm.sample (or equivalent PyMC posterior sampler) to fit the baseline model and returns the trace for reuse instead of re-sampling per call.",
      "max_score": 10
    },
    {
      "name": "Do intervention",
      "description": "Implements campaign_spend interventions with pm.do or the do-transform utilities so predictive draws ignore observed spend and clamp to the provided value.",
      "max_score": 25
    },
    {
      "name": "Observation update",
      "description": "Implements new organic_visits conditioning with pm.observe/pm.set_data (or pm.observe rewrites) so the posterior predictive reflects the provided observation without altering other edges.",
      "max_score": 20
    },
    {
      "name": "Predictive draws",
      "description": "Generates posterior predictive samples for observational, interventional, and observed scenarios using pm.sample_posterior_predictive (or pm.draw/pm.fast_sample_posterior_predictive) with requested draw counts.",
      "max_score": 10
    },
    {
      "name": "Graph exports",
      "description": "Produces Graphviz DOT and Mermaid text using PyMC graph exporters (e.g., pm.model_to_graphviz, pm.model_to_mermaid, or pm.model_to_networkx) and ensures edges and node labels match the causal structure.",
      "max_score": 15
    },
    {
      "name": "Seed control",
      "description": "Threads the random_seed argument into pm.sample and predictive utilities so repeated runs with the same seed yield consistent outputs.",
      "max_score": 5
    }
  ]
}

tessl/pypi-pymc3

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-8/

rubric.jsonevals/scenario-8/