CtrlK
BlogDocsLog inGet started
Tessl Logo

leo-laptop/calculate

Use when the user asks you to calculate, compute, evaluate, or solve a math expression or equation. Triggers on arithmetic, order of operations (PEMDAS), fractions, percentages, exponents, and multi-step math problems.

84

1.00x
Quality

78%

Does it follow best practices?

Impact

94%

1.00x

Average score across 5 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent follows the prescribed response format for math evaluations: restating the expression, identifying groupings, working through each PEMDAS step with labels and intermediate results, and stating the final answer clearly. All three expressions exercise multiple PEMDAS tiers.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Expression restated",
      "description": "Each of the three expressions is restated or quoted clearly near the start of its solution section",
      "max_score": 7
    },
    {
      "name": "Parentheses step labeled",
      "description": "For at least two solutions, the parentheses/grouping step is explicitly labeled (e.g., 'Step 1 — Parentheses' or equivalent)",
      "max_score": 10
    },
    {
      "name": "Exponents step labeled",
      "description": "For at least two solutions, the exponents step is explicitly labeled (e.g., 'Step 2 — Exponents' or equivalent)",
      "max_score": 10
    },
    {
      "name": "Mult/Div step labeled",
      "description": "For at least two solutions, the multiplication/division step is explicitly labeled (e.g., 'Step 3 — Mult/Div' or equivalent)",
      "max_score": 10
    },
    {
      "name": "Add/Sub step labeled",
      "description": "For at least two solutions, the addition/subtraction step is explicitly labeled (e.g., 'Step 4 — Add/Sub' or equivalent)",
      "max_score": 10
    },
    {
      "name": "Intermediate results shown",
      "description": "Each solution shows intermediate results after each step (not just the final answer)",
      "max_score": 10
    },
    {
      "name": "None noted for absent steps",
      "description": "At least one solution explicitly marks a PEMDAS step as '(none)' or equivalent when that operation type does not appear in the expression",
      "max_score": 8
    },
    {
      "name": "Correct answer expr 1",
      "description": "Expression 1 (3 + 6 × (5 + 4) ÷ 3 - 7) evaluates to 14",
      "max_score": 10
    },
    {
      "name": "Correct answer expr 2",
      "description": "Expression 2 (2^4 - (3 + 1) * 2) evaluates to 8",
      "max_score": 10
    },
    {
      "name": "Correct answer expr 3",
      "description": "Expression 3 (10 ÷ 2 + 3^2 * (4 - 1)) evaluates to 32",
      "max_score": 10
    },
    {
      "name": "Final answer clearly stated",
      "description": "Each solution ends with a clearly labeled final answer (e.g., 'Answer: X' or equivalent)",
      "max_score": 5
    }
  ]
}

evals

scenario-1

criteria.json

task.md

SKILL.md

tile.json