Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly implements operational tier retry logic (up to 3 retries) and a suppression budget pattern for recurring failures, with auto-escalation when thresholds are exceeded.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Operational retry limit",
"description": "Retry logic for transient API errors (rate-limiting/unavailable) allows at most 3 attempts, not more and not fewer (e.g. max_retries=3 or equivalent)",
"max_score": 12
},
{
"name": "Suppression budget store",
"description": "Code includes a persistent/in-memory store (dict, database, file, etc.) that tracks failure records across invocations, keyed by failure type or identifier",
"max_score": 10
},
{
"name": "Recurrence count tracking",
"description": "The budget store tracks a count (or equivalent integer counter) that increments with each recurrence of the same failure",
"max_score": 10
},
{
"name": "Time window tracking",
"description": "The budget store tracks first_seen timestamp (or equivalent) to measure elapsed time since failure first occurred",
"max_score": 8
},
{
"name": "Auto-escalate on threshold",
"description": "Code escalates (raises, calls escalation function, or logs escalation) AND clears/resets the budget record when either count threshold OR time window is exceeded",
"max_score": 12
},
{
"name": "Suppress within budget",
"description": "When thresholds are NOT exceeded, code suppresses/retries (does not escalate) and updates the store record",
"max_score": 8
},
{
"name": "Escalation does not suppress",
"description": "After escalation is triggered, the budget record is cleared (not kept), ensuring the failure is not subsequently suppressed",
"max_score": 8
},
{
"name": "Structured triage output",
"description": "Module includes a report structure (dataclass, dict, namedtuple, etc.) that captures at minimum: the failure signal AND the action taken",
"max_score": 8
},
{
"name": "Tier label used",
"description": "Code or comments reference tier classification (e.g., 'operational', or at least distinguishes between different severity levels with named constants/strings)",
"max_score": 7
},
{
"name": "No suppress on data-loss",
"description": "Code does NOT apply suppression/retry logic to write-failure or data-integrity error scenarios — if included, these are handled differently (halt/escalate)",
"max_score": 7
},
{
"name": "README present",
"description": "README.md file exists and contains basic usage instructions",
"max_score": 10
}
]
}