Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
{
"context": "Tests whether the agent correctly triggers triage on the 'evidence missing' condition (cannot prove success), follows the triage workflow sequence, and produces a structured output.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Evidence-missing trigger",
"description": "Code explicitly checks whether success can be confirmed (e.g., checks for presence of expected outputs) and treats missing/incomplete outputs as a triage-triggering condition, not just a warning",
"max_score": 12
},
{
"name": "Evidence collection step",
"description": "Code collects multiple pieces of evidence before classifying the failure: at minimum, which files are present/absent AND the overall completeness state",
"max_score": 10
},
{
"name": "Tier classification present",
"description": "Code or triage_result.json includes an explicit tier or severity classification for the missing-evidence scenario (cosmetic, operational, or critical label)",
"max_score": 10
},
{
"name": "Action derived from tier",
"description": "The action taken (retry build, escalate, log, halt) is determined by the tier assigned — not hardcoded independently of classification",
"max_score": 8
},
{
"name": "triage_result.json: failure signal",
"description": "triage_result.json includes a field representing the failure signal (e.g., 'missing_outputs', 'evidence_missing', or equivalent)",
"max_score": 8
},
{
"name": "triage_result.json: evidence observed",
"description": "triage_result.json includes a field capturing observed evidence (e.g., list of present/missing files, or directory contents)",
"max_score": 8
},
{
"name": "triage_result.json: tier assigned",
"description": "triage_result.json includes a field for the tier or severity assigned",
"max_score": 8
},
{
"name": "triage_result.json: action taken",
"description": "triage_result.json includes a field for the action that was taken in response",
"max_score": 8
},
{
"name": "triage_result.json: escalation status",
"description": "triage_result.json includes a field for escalation status (escalated: true/false, or equivalent)",
"max_score": 8
},
{
"name": "Expected files list used",
"description": "Script reads or uses the expected_files.txt input to determine which files to check for (not a hardcoded list in the script)",
"max_score": 10
},
{
"name": "INTEGRATION.md present",
"description": "INTEGRATION.md file exists and describes how to use the script in a CI pipeline",
"max_score": 10
}
]
}