Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
90%
Does it follow best practices?
Impact
98%
1.25xAverage score across 9 eval scenarios
An infrastructure automation team is building a state machine that provisions cloud resources across multiple stages: network setup, database provisioning, application deployment, and post-deployment verification. Each stage transition involves writing state to a shared configuration store, calling provisioning APIs, and reading back results.
The team has been burned by treating all failures equally — some failures are transient and safe to retry, while others indicate data corruption or missing critical resources that require human intervention. They want a formal classification scheme embedded in their contract documentation that clearly distinguishes which invariant violations are operational (safe to retry/recover automatically) versus critical (must halt and escalate immediately). This classification will be used to drive the state machine's error-handling logic.
Produce the following files:
contract.md — A detectability contract with a boundary table covering at least four boundary points in this state machine: a state write, an external API call, a resume/readiness check, and a final verification. For each boundary define the invariants, failure class for each invariant violation, and escalation trigger. Failure classifications must clearly use 'operational' and 'critical' labels.failure_taxonomy.md — A short classification guide (can be a table or bulleted list) that maps each type of invariant violation to its failure class and explains the reasoning. Include: missing artifact, bad schema, stale timestamp, non-2xx response, missing required fields, and unverifiable/unknown state.