Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly classifies non-output-affecting warnings as cosmetic tier, applies the correct retry limit of 2 (not 3, not unlimited), and logs after retries are exhausted.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Cosmetic tier label",
"description": "Code or comments label this type of warning as 'cosmetic' (or equivalent lowest-severity tier name)",
"max_score": 12
},
{
"name": "Retry limit of 2",
"description": "Retry logic allows at most 2 retry attempts (not 3, not unlimited) before giving up on a cosmetic failure",
"max_score": 15
},
{
"name": "Log after retries exhausted",
"description": "After retry attempts are exhausted, code logs the warning (writes to log, prints warning, or calls a log function) and continues rather than escalating or raising",
"max_score": 12
},
{
"name": "No escalation for cosmetic",
"description": "Code does NOT escalate (halt processing, raise exception, or trigger on-call alert) for cosmetic failures after retries are exhausted",
"max_score": 10
},
{
"name": "Tier-based branching",
"description": "Code contains branching logic (if/elif/switch or dispatch) that handles different severity levels differently, not just a single uniform retry policy",
"max_score": 10
},
{
"name": "Report structure present",
"description": "Module includes a report/result structure (dataclass, namedtuple, dict) capturing the failure signal and action taken",
"max_score": 8
},
{
"name": "Report includes tier",
"description": "Report structure includes a field representing the tier or severity level assigned",
"max_score": 8
},
{
"name": "POLICY.md present",
"description": "POLICY.md file exists and describes the retry/log strategy",
"max_score": 10
},
{
"name": "Condition: output still correct",
"description": "Code or comments capture the condition for cosmetic classification — that output remains correct/complete despite the warning",
"max_score": 8
},
{
"name": "No suppression budget for cosmetic",
"description": "Code does NOT implement a suppression budget (time window tracking, recurrence store) for cosmetic failures — suppression budget is reserved for operational tier",
"max_score": 7
}
]
}