Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
90%
Does it follow best practices?
Impact
98%
1.25xAverage score across 9 eval scenarios
Passed
No known issues
boundary contract generation
Contract table exists
100%
100%
Invariant specificity
100%
100%
Failure mapping
100%
100%
Unknown state handling
100%
100%
File handoff contract
Boundary identification
100%
100%
Artifact exists invariant
100%
100%
Schema valid invariant
100%
100%
Table format
0%
100%
Failure class mapping
0%
100%
Escalation trigger defined
0%
100%
Verification probe defined
0%
100%
Path exists assert
100%
100%
Schema parse check
100%
100%
Missing file as critical
100%
100%
Retry-then-halt escalation
0%
0%
Memory resume verification
Memory resume boundary
100%
100%
Key exists invariant
100%
100%
Timestamp freshness invariant
100%
100%
Value deserialises invariant
100%
100%
Table columns present
0%
100%
Stale entry as operational
0%
100%
Missing key as critical
0%
100%
Re-computation escalation
100%
100%
Timestamp check in script
100%
100%
Objective checks only
100%
100%
Non-null probe
100%
100%
Tool call reliability contract
Tool call boundary named
100%
100%
HTTP status invariant
100%
100%
Required fields invariant
100%
100%
Re-fetch probe
0%
100%
Required keys validation probe
100%
100%
Non-2xx as operational
44%
100%
Missing fields as critical
44%
100%
Two-failure escalation
0%
100%
Table format correct
50%
100%
Script field validation
100%
100%
Unknown state as operational
87%
100%
Non-zero exit on failure
100%
100%
Multi-boundary workflow contract
Multiple boundary types
100%
100%
Five-column table
0%
100%
Invariants for each boundary
100%
100%
Probes for each boundary
100%
100%
Failure class for each boundary
0%
100%
Escalation trigger for each boundary
100%
100%
Artifact exists invariant used
100%
100%
Timestamp freshness invariant used
100%
100%
Checksum or hash invariant used
100%
100%
Critical vs operational distinction
0%
100%
Final report boundary included
100%
100%
Resume/readiness boundary included
100%
100%
Escalation trigger design
File handoff escalation
75%
83%
Memory resume escalation
100%
100%
API call escalation
33%
58%
Five-column table
0%
100%
Critical vs operational
20%
100%
Critical triggers halt
80%
100%
Operational triggers retry
80%
100%
Missing evidence = operational
90%
100%
Invariants in table
57%
100%
Probes in table
28%
100%
Invariant check implementation
Path exists check
100%
100%
Non-empty check
100%
100%
JSON parse check
100%
100%
Timestamp freshness check
100%
100%
SHA-256 checksum
100%
100%
Exit code on failure
100%
100%
Per-check output
100%
100%
Conditional timestamp check
100%
100%
Contract table present
100%
100%
Failure message specificity
100%
100%
Assert pattern or equivalent
100%
100%
Objective vs confidence triggers
Objective triggers only
100%
100%
No confidence-based trigger
100%
100%
Unverifiable state classified
100%
100%
File write boundary trigger
100%
100%
API integration boundary trigger
100%
100%
Cache freshness boundary trigger
100%
100%
Failure classification present
100%
100%
Design principle documented
100%
100%
Unknown state principle documented
100%
100%
Five-column table
0%
100%
Failure classification mapping
Missing artifact as critical
100%
100%
Bad schema as operational
0%
100%
Stale timestamp as operational
100%
100%
Non-2xx as operational
100%
100%
Missing fields as critical
100%
100%
Unknown state as operational minimum
100%
100%
Five-column table
50%
100%
Four boundary types
100%
100%
Critical halt escalation
100%
100%
Operational retry escalation
100%
100%
Taxonomy completeness
100%
100%
Table of Contents