Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
Apply triage based on failed checks, not intuition.
Run triage when any condition is true:
Follow this sequence for every failure event:
Validation checkpoint: do not advance from step 3 to step 4 if the tier is unknown or unverifiable — treat as at least operational.
Pseudocode — suppression budget check:
function check_suppression_budget(failure_key):
record = budget_store.get(failure_key, {count: 0, first_seen: now()})
record.count += 1
elapsed = now() - record.first_seen
if record.count > MAX_RECURRENCE or elapsed > MAX_WINDOW:
escalate(failure_key, record)
budget_store.clear(failure_key)
else:
suppress(failure_key, record)
budget_store.set(failure_key, record)Advanced implementations: For projects requiring multiple storage backends (in-memory, Redis, database) or custom threshold configurations per failure type, consider extracting this pattern into a dedicated reference file (e.g.,
SUPPRESSION_BUDGET.md) alongside the skill.
Example 1 — cosmetic
Example 2 — operational
Example 3 — critical