Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent implements a validation checkpoint where unknown/unverifiable errors default to at least operational severity, rather than cosmetic or being ignored.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Unknown defaults to operational+",
"description": "Code assigns 'operational' (or 'critical') — NOT 'cosmetic' — when an error cannot be clearly classified. The fallback tier is operational or higher.",
"max_score": 15
},
{
"name": "Explicit unknown/unverifiable check",
"description": "Code contains an explicit branch or condition for unknown/unclassifiable errors (e.g., else clause, 'unknown' case, or a default that is not cosmetic)",
"max_score": 12
},
{
"name": "Three-tier classification",
"description": "Classification logic distinguishes at least three severity levels (cosmetic/minor, operational/moderate, critical/severe or equivalent named tiers)",
"max_score": 10
},
{
"name": "Validation before action",
"description": "Code checks/validates the tier BEFORE determining the action to take (tier is determined first, then action is derived from tier)",
"max_score": 10
},
{
"name": "Ambiguous example classified operational+",
"description": "classify_examples.py includes an ambiguous/unknown input AND its output shows it was assigned operational or critical tier (not cosmetic/benign)",
"max_score": 12
},
{
"name": "Action taken from tier",
"description": "The action (retry, suppress, escalate, halt) is derived from the tier classification — the same tier always produces the same action",
"max_score": 8
},
{
"name": "TriageDecision includes tier",
"description": "TriageDecision (or equivalent return type) includes a field for the assigned tier",
"max_score": 8
},
{
"name": "TriageDecision includes action",
"description": "TriageDecision (or equivalent return type) includes a field for the action taken or recommended",
"max_score": 8
},
{
"name": "DESIGN.md explains fallback",
"description": "DESIGN.md explicitly describes the conservative fallback policy for ambiguous errors (not just listing the tiers)",
"max_score": 10
},
{
"name": "Examples script runs",
"description": "classify_examples.py is a runnable script (has a main block or direct calls) and demonstrates at least 3 different inputs",
"max_score": 7
}
]
}