Use when the user asks you to calculate, compute, evaluate, or solve a math expression or equation. Triggers on arithmetic, order of operations (PEMDAS), fractions, percentages, exponents, and multi-step math problems.
84
78%
Does it follow best practices?
Impact
94%
1.00xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent follows the prescribed response format for math evaluations: restating the expression, identifying groupings, working through each PEMDAS step with labels and intermediate results, and stating the final answer clearly. All three expressions exercise multiple PEMDAS tiers.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Expression restated",
"description": "Each of the three expressions is restated or quoted clearly near the start of its solution section",
"max_score": 7
},
{
"name": "Parentheses step labeled",
"description": "For at least two solutions, the parentheses/grouping step is explicitly labeled (e.g., 'Step 1 — Parentheses' or equivalent)",
"max_score": 10
},
{
"name": "Exponents step labeled",
"description": "For at least two solutions, the exponents step is explicitly labeled (e.g., 'Step 2 — Exponents' or equivalent)",
"max_score": 10
},
{
"name": "Mult/Div step labeled",
"description": "For at least two solutions, the multiplication/division step is explicitly labeled (e.g., 'Step 3 — Mult/Div' or equivalent)",
"max_score": 10
},
{
"name": "Add/Sub step labeled",
"description": "For at least two solutions, the addition/subtraction step is explicitly labeled (e.g., 'Step 4 — Add/Sub' or equivalent)",
"max_score": 10
},
{
"name": "Intermediate results shown",
"description": "Each solution shows intermediate results after each step (not just the final answer)",
"max_score": 10
},
{
"name": "None noted for absent steps",
"description": "At least one solution explicitly marks a PEMDAS step as '(none)' or equivalent when that operation type does not appear in the expression",
"max_score": 8
},
{
"name": "Correct answer expr 1",
"description": "Expression 1 (3 + 6 × (5 + 4) ÷ 3 - 7) evaluates to 14",
"max_score": 10
},
{
"name": "Correct answer expr 2",
"description": "Expression 2 (2^4 - (3 + 1) * 2) evaluates to 8",
"max_score": 10
},
{
"name": "Correct answer expr 3",
"description": "Expression 3 (10 ÷ 2 + 3^2 * (4 - 1)) evaluates to 32",
"max_score": 10
},
{
"name": "Final answer clearly stated",
"description": "Each solution ends with a clearly labeled final answer (e.g., 'Answer: X' or equivalent)",
"max_score": 5
}
]
}