Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
Agent Success
Agent success rate when using this tile
68%
Improvement
Agent success rate improvement when using this tile compared to baseline
0.94x
Baseline
Agent success rate without this tile
72%
{
"context": "Evaluates whether the solution builds the segmented conversion model in PyMC using distributions for priors and likelihoods, deterministic transforms for rates and summaries, and a potential term to enforce the target overall rate. Rewards correct use of PyMC primitives that align with the specified variables and penalty structure.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Model setup",
"description": "Uses a pm.Model context to construct the model and returns the pm.Model with named components (baseline_logit, segment_offset, segment_rate_<name>, overall_rate) matching the spec.",
"max_score": 15
},
{
"name": "Priors & offsets",
"description": "Defines baseline_logit as a prior random variable (e.g., pm.Normal) and segment_offset as a vector prior sized to the unique segment_ids, with offsets indexed by segment_ids when computing rates.",
"max_score": 20
},
{
"name": "Logistic rates",
"description": "Computes per-row conversion probabilities with pm.math.sigmoid(baseline_logit + segment_offset[segment_ids]) and records them via pm.Deterministic for trace inspection.",
"max_score": 20
},
{
"name": "Deterministic summaries",
"description": "Publishes per-segment deterministic rates using pm.Deterministic named segment_rate_<name> for each unique segment and an overall_rate deterministic equal to the exposure-weighted mean of per-row rates.",
"max_score": 15
},
{
"name": "Observed likelihood",
"description": "Ties conversions to exposures with a count likelihood using pm.Binomial (n=exposures, p=per-row rates) and observed=conversions inside the model.",
"max_score": 15
},
{
"name": "Penalty potential",
"description": "Adds pm.Potential with value -penalty_scale * (overall_rate - target_overall_rate) ** 2 so deviations from the target rate reduce log-density, applied regardless of target boundary values.",
"max_score": 15
}
]
}docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10