Every external call needs a timeout, every timeout needs a fallback — resilience patterns for HTTP, databases, and third-party services
88
90%
Does it follow best practices?
Impact
85%
4.72xAverage score across 5 eval scenarios
Passed
No known issues
{
"instruction": "Retry transient failures with exponential backoff and jitter, never retry non-transient errors",
"relevant_when": "Agent writes code that retries failed HTTP requests or external service calls",
"context": "Retries without backoff cause a thundering herd that makes outages worse. Retries without jitter cause synchronized retry spikes. Retrying non-transient errors (HTTP 400, 404, 422) wastes resources and never succeeds. Retry logic should: use exponential backoff (e.g., 1s, 2s, 4s), add random jitter to prevent synchronized retries, cap the number of attempts (typically 3), and only retry on transient errors (429, 502, 503, 504, network errors).",
"sources": [
{
"type": "file",
"filename": "skills/graceful-degradation/SKILL.md",
"tile": "tessl-labs/graceful-degradation@0.2.0"
}
],
"checklist": [
{
"name": "backoff-is-exponential",
"rule": "Retry delays increase exponentially between attempts (e.g., baseDelay * 2^attempt) rather than using fixed delays",
"relevant_when": "Agent implements retry logic for external calls"
},
{
"name": "jitter-added-to-delay",
"rule": "Retry delays include random jitter (e.g., delay * random factor) to prevent synchronized retry storms across instances",
"relevant_when": "Agent implements retry logic with backoff"
},
{
"name": "only-transient-errors-retried",
"rule": "Retry logic checks whether the error is transient (429, 502, 503, 504, network errors) before retrying. Non-transient errors like 400, 401, 403, 404, 422 are not retried.",
"relevant_when": "Agent implements retry logic for HTTP calls"
},
{
"name": "max-attempts-capped",
"rule": "Retry logic has a maximum number of attempts (typically 3) and throws or returns a fallback after exhausting retries",
"relevant_when": "Agent implements retry logic"
}
]
}