Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
90%
Does it follow best practices?
Impact
98%
1.25xAverage score across 9 eval scenarios
boundary contract generation
Contract table exists
100%
100%
Invariant specificity
100%
100%
Failure mapping
100%
100%
Unknown state handling
100%
100%
Without context: $0.2281 · 1m 11s · 10 turns · 13 in / 3,923 out tokens
With context: $0.2553 · 1m 6s · 14 turns · 552 in / 3,450 out tokens
File handoff contract
Boundary identification
100%
100%
Artifact exists invariant
100%
100%
Schema valid invariant
100%
100%
Table format
0%
100%
Failure class mapping
0%
100%
Escalation trigger defined
0%
100%
Verification probe defined
0%
100%
Path exists assert
100%
100%
Schema parse check
100%
100%
Missing file as critical
100%
100%
Retry-then-halt escalation
0%
0%
Without context: $0.3016 · 2m 11s · 12 turns · 15 in / 5,504 out tokens
With context: $0.5407 · 2m 30s · 25 turns · 408 in / 7,773 out tokens
Memory resume verification
Memory resume boundary
100%
100%
Key exists invariant
100%
100%
Timestamp freshness invariant
100%
100%
Value deserialises invariant
100%
100%
Table columns present
0%
100%
Stale entry as operational
0%
100%
Missing key as critical
0%
100%
Re-computation escalation
100%
100%
Timestamp check in script
100%
100%
Objective checks only
100%
100%
Non-null probe
100%
100%
Without context: $0.4224 · 1m 56s · 22 turns · 28 in / 6,305 out tokens
With context: $0.3769 · 1m 34s · 21 turns · 403 in / 5,268 out tokens
Tool call reliability contract
Tool call boundary named
100%
100%
HTTP status invariant
100%
100%
Required fields invariant
100%
100%
Re-fetch probe
0%
100%
Required keys validation probe
100%
100%
Non-2xx as operational
44%
100%
Missing fields as critical
44%
100%
Two-failure escalation
0%
100%
Table format correct
50%
100%
Script field validation
100%
100%
Unknown state as operational
87%
100%
Non-zero exit on failure
100%
100%
Without context: $0.2546 · 1m 13s · 14 turns · 19 in / 4,087 out tokens
With context: $0.5270 · 2m 2s · 26 turns · 409 in / 6,940 out tokens
Multi-boundary workflow contract
Multiple boundary types
100%
100%
Five-column table
0%
100%
Invariants for each boundary
100%
100%
Probes for each boundary
100%
100%
Failure class for each boundary
0%
100%
Escalation trigger for each boundary
100%
100%
Artifact exists invariant used
100%
100%
Timestamp freshness invariant used
100%
100%
Checksum or hash invariant used
100%
100%
Critical vs operational distinction
0%
100%
Final report boundary included
100%
100%
Resume/readiness boundary included
100%
100%
Without context: $0.2956 · 1m 38s · 13 turns · 19 in / 5,486 out tokens
With context: $0.4596 · 2m 23s · 21 turns · 29 in / 8,105 out tokens
Escalation trigger design
File handoff escalation
75%
83%
Memory resume escalation
100%
100%
API call escalation
33%
58%
Five-column table
0%
100%
Critical vs operational
20%
100%
Critical triggers halt
80%
100%
Operational triggers retry
80%
100%
Missing evidence = operational
90%
100%
Invariants in table
57%
100%
Probes in table
28%
100%
Without context: $0.1977 · 1m 23s · 8 turns · 12 in / 3,772 out tokens
With context: $0.4700 · 2m 16s · 22 turns · 407 in / 6,794 out tokens
Invariant check implementation
Path exists check
100%
100%
Non-empty check
100%
100%
JSON parse check
100%
100%
Timestamp freshness check
100%
100%
SHA-256 checksum
100%
100%
Exit code on failure
100%
100%
Per-check output
100%
100%
Conditional timestamp check
100%
100%
Contract table present
100%
100%
Failure message specificity
100%
100%
Assert pattern or equivalent
100%
100%
Without context: $0.2722 · 1m 13s · 14 turns · 17 in / 4,086 out tokens
With context: $0.4734 · 2m 16s · 26 turns · 31 in / 6,012 out tokens
Objective vs confidence triggers
Objective triggers only
100%
100%
No confidence-based trigger
100%
100%
Unverifiable state classified
100%
100%
File write boundary trigger
100%
100%
API integration boundary trigger
100%
100%
Cache freshness boundary trigger
100%
100%
Failure classification present
100%
100%
Design principle documented
100%
100%
Unknown state principle documented
100%
100%
Five-column table
0%
100%
Without context: $0.2218 · 1m 28s · 12 turns · 19 in / 3,691 out tokens
With context: $0.3468 · 1m 40s · 18 turns · 431 in / 5,187 out tokens
Failure classification mapping
Missing artifact as critical
100%
100%
Bad schema as operational
0%
100%
Stale timestamp as operational
100%
100%
Non-2xx as operational
100%
100%
Missing fields as critical
100%
100%
Unknown state as operational minimum
100%
100%
Five-column table
50%
100%
Four boundary types
100%
100%
Critical halt escalation
100%
100%
Operational retry escalation
100%
100%
Taxonomy completeness
100%
100%
Without context: $0.2052 · 1m 17s · 9 turns · 12 in / 3,396 out tokens
With context: $0.3473 · 1m 44s · 17 turns · 398 in / 5,153 out tokens
Install with Tessl CLI
npx tessl i markusdowne/detectability-contract@0.1.2Table of Contents