Use when the user asks you to calculate, compute, evaluate, or solve a math expression or equation. Triggers on arithmetic, order of operations (PEMDAS), fractions, percentages, exponents, and multi-step math problems.
84
78%
Does it follow best practices?
Impact
94%
1.00xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly handles negative numbers in multiplication, negation applied to parenthesized groups, and negative exponents. Also checks for step-by-step format with labeled PEMDAS stages and clear final answers.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Correct result formula 1",
"description": "Formula 1 (-5 * -3 + 10): correct answer is 25 (negative times negative = positive 15, plus 10)",
"max_score": 10
},
{
"name": "Correct result formula 2",
"description": "Formula 2 (50 - -(8 + 2)): correct answer is 60 (parentheses give 10, negation gives -10, then 50 - (-10) = 60)",
"max_score": 10
},
{
"name": "Correct result formula 3",
"description": "Formula 3 (-3 * (4 + 2) + -(1 + 4)): correct answer is -23 (parentheses first: 6 and 5; then -3*6 = -18 and -(5) = -5; then -18 + -5 = -23)",
"max_score": 10
},
{
"name": "Correct result formula 4",
"description": "Formula 4 ((-2)^3 + -(6 - 10)): correct answer is -4 ((-2)^3 = -8, -(6-10) = -(-4) = 4, then -8 + 4 = -4)",
"max_score": 10
},
{
"name": "Negative times negative shown",
"description": "Formula 1 solution explicitly shows that -5 * -3 = 15 (not -15), demonstrating correct handling of negative multiplication",
"max_score": 10
},
{
"name": "Negated group handled correctly",
"description": "At least one solution explicitly shows that -(group) distributes the negative sign to the result of the group (e.g., -(8+2) = -(10) = -10)",
"max_score": 15
},
{
"name": "Parentheses resolved first",
"description": "For formulas 2, 3, and 4, the parenthesized subexpression is evaluated before the outer negation or multiplication is applied",
"max_score": 10
},
{
"name": "Step-by-step work shown",
"description": "Each formula includes intermediate results, not just a final answer",
"max_score": 8
},
{
"name": "PEMDAS steps labeled",
"description": "At least two formulas use labeled PEMDAS step headings (e.g., 'Step 1 — Parentheses', 'Step 3 — Mult/Div', or equivalent)",
"max_score": 9
},
{
"name": "Final answer clearly stated",
"description": "Each formula ends with a clearly labeled or highlighted final answer",
"max_score": 8
}
]
}