Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly handles round-trip verification failures as critical tier, immediately halts further autonomous processing, escalates without retry, and never suppresses data-loss signals.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Round-trip verification",
"description": "Code performs a read-back check after a write and compares the result to the expected written value",
"max_score": 10
},
{
"name": "Critical tier classification",
"description": "Code or comments label the read-back mismatch as 'critical' (or highest severity equivalent) — not operational or cosmetic",
"max_score": 12
},
{
"name": "Halt autonomous processing",
"description": "On verification failure, code stops further processing for this path (raises exception, returns error state, or otherwise prevents continuation) rather than logging and continuing",
"max_score": 12
},
{
"name": "Immediate escalation",
"description": "On verification failure, code escalates immediately (calls escalation function, raises, or sets escalation flag) without retry attempts first",
"max_score": 10
},
{
"name": "No retry on data-loss",
"description": "Code does NOT retry the write operation after a verification mismatch is detected (no retry loop or fallback write on mismatch)",
"max_score": 12
},
{
"name": "No suppression of data-loss",
"description": "Code does NOT swallow or suppress the verification failure (no bare except, no silent continue, no suppression budget applied to this failure type)",
"max_score": 10
},
{
"name": "Evidence in report",
"description": "Report structure includes the failure signal AND observed evidence (e.g., expected vs actual values, or equivalent mismatch information)",
"max_score": 8
},
{
"name": "Escalation status in report",
"description": "Report structure includes escalation status (e.g., a field indicating escalated=True or equivalent)",
"max_score": 8
},
{
"name": "Tests present",
"description": "test_write_guard.py contains at least 2 test cases covering the verification failure scenario",
"max_score": 10
},
{
"name": "Tier label in code",
"description": "Code includes an explicit string or constant referencing the tier (e.g., 'critical' appears as a value or comment in the classification logic)",
"max_score": 8
}
]
}