Diagnoses and routes failures by analyzing error patterns, classifying severity, and applying retry logic, suppression budgets, and escalation rules. Use when handling errors, troubleshooting failures, recovering from API errors or timeouts, deciding whether to retry or escalate an issue, or managing service outages and tool dependency failures. Applies to any scenario where a check has failed, evidence of success is missing, or an unresolved error needs a structured response. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
98
Quality
94%
Does it follow best practices?
Impact
100%
1.16xAverage score across 9 eval scenarios
A payments startup has built a checkout service that calls a third-party payment processor API. The API sometimes responds with HTTP 429 (Too Many Requests) during peak traffic, and other times with intermittent 503 (Service Unavailable) responses. The engineering team has noticed that their current error handling is too aggressive — it either gives up immediately on the first error or retries indefinitely, causing issues on both ends.
The team needs a Python error handling module that handles these transient failures gracefully. The module will be used by their checkout service and must correctly handle the situation where the same failure keeps recurring over a 24-hour window, eventually escalating to on-call engineers rather than silently retrying forever.
Write a Python module error_handler.py that implements error handling for the payment API integration. The module should:
TriageReport dataclass or similar structure for documenting what happenedREADME.md describing how to use the moduleThe module should be self-contained and importable (no external API keys or services required to import).