Idempotent API design — safe retries for POST endpoints, idempotency keys,
93
90%
Does it follow best practices?
Impact
100%
10.00xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent proactively adds idempotency middleware to an order creation endpoint. The task mentions 'spotty connectivity' as context but does NOT say 'add idempotency.' A skilled agent should recognize that spotty connectivity + POST order creation = need for server-side idempotency protection.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Idempotency-Key header read",
"description": "The server reads an idempotency key from the 'Idempotency-Key' request header (or similar header) and uses it to deduplicate requests",
"max_score": 16
},
{
"name": "Cache lookup before processing",
"description": "Before executing the order creation logic, the server checks whether the idempotency key already exists in a store and returns the cached response if found",
"max_score": 14
},
{
"name": "Response stored after processing",
"description": "After a new request is processed, the response (status code + body) is stored keyed by the idempotency key for future lookups",
"max_score": 10
},
{
"name": "Lock key as processing",
"description": "Before executing the handler, the server marks the idempotency key as 'processing' so concurrent duplicate requests can be detected",
"max_score": 14
},
{
"name": "409 Conflict for concurrent duplicates",
"description": "When a request arrives with an idempotency key that is currently being processed, the server returns 409 Conflict",
"max_score": 14
},
{
"name": "5xx errors not cached",
"description": "Server errors (5xx) are NOT cached in the idempotency store — the key entry is removed so the client can retry",
"max_score": 12
},
{
"name": "TTL on cached entries",
"description": "Cached idempotency entries have a TTL/expiry so they don't accumulate forever",
"max_score": 10
},
{
"name": "Cached status code preserved",
"description": "When returning a cached response, the original HTTP status code is used, not a hardcoded 200",
"max_score": 10
}
]
}