Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
90%
Does it follow best practices?
Impact
98%
1.25xAverage score across 9 eval scenarios
{
"context": "Evaluates whether the agent produced outputs aligned with the skill's reliability-check workflow.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Contract table exists",
"description": "Output includes a boundary contract table with 3 or more boundaries",
"max_score": 25
},
{
"name": "Invariant specificity",
"description": "Each boundary defines concrete invariants and probes",
"max_score": 25
},
{
"name": "Failure mapping",
"description": "Each boundary maps failures to operational/critical outcomes with escalation trigger",
"max_score": 25
},
{
"name": "Unknown state handling",
"description": "Explicitly treats unknown/unverifiable state as at least operational risk",
"max_score": 25
}
]
}