Creates boundary-point validation contracts, defines invariant-based success criteria, and sets up automated verification probes so reliability workflows trigger on objective evidence rather than intuition. Use when designing robust handoff, memory-persistence, or tool-call reliability workflows; when you need to verify handoffs work, check memory persistence, validate tool calls succeeded, or convert vague reliability goals into concrete, testable checks at each boundary point with explicit failure-class mapping (operational vs. critical); or when you want to test your workflow end-to-end, make sure it works, or verify your automation runs correctly using read-back probes and escalation triggers rather than agent confidence. Includes explicit untrusted-content/prompt-injection guardrails for third-party inputs.
96
Quality
90%
Does it follow best practices?
Impact
98%
1.25xAverage score across 9 eval scenarios
Discovery
85%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description excels at specificity and completeness, clearly articulating what the skill does and when to use it with an explicit 'Use when' clause. However, the heavy reliance on technical jargon ('boundary-point validation contracts', 'invariant-based success criteria', 'failure-class mapping') may reduce discoverability since users are unlikely to naturally use these exact terms when seeking this functionality.
Suggestions
Add simpler, more natural trigger terms alongside technical ones (e.g., 'test if my workflow works', 'check if handoffs succeed', 'verify automation reliability')
Include common user phrasings like 'debug workflow', 'troubleshoot automation', or 'ensure reliability' to improve trigger term coverage
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Creates boundary-point validation contracts', 'defines invariant-based success criteria', 'sets up automated verification probes'. These are distinct, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both what (creates validation contracts, defines success criteria, sets up verification probes) AND when with explicit 'Use when' clause covering multiple trigger scenarios (designing reliability workflows, verifying handoffs, testing end-to-end). | 3 / 3 |
Trigger Term Quality | Contains some relevant keywords like 'handoff', 'memory persistence', 'tool calls', 'workflow', 'verification', but uses heavy technical jargon ('invariant-based', 'boundary-point validation contracts', 'failure-class mapping') that users are unlikely to naturally say. Missing simpler variations users might use. | 2 / 3 |
Distinctiveness Conflict Risk | Highly specific niche around boundary-point validation and reliability verification with distinct technical domain. Unlikely to conflict with general testing or workflow skills due to specific focus on 'validation contracts', 'invariant-based criteria', and 'verification probes'. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
92%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a high-quality skill that provides clear, actionable guidance for creating detectability contracts. The workflow is well-structured with explicit validation steps, concrete code examples, and a useful contract table template. The only minor weakness is the lengthy inline guardrails section that could benefit from being referenced externally.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lean and efficient, presenting only actionable information without explaining concepts Claude already knows. Every section serves a clear purpose with no padding or unnecessary context. | 3 / 3 |
Actionability | Provides concrete, executable Python code examples for invariant checks, a clear workflow with numbered steps, and a detailed example contract table that serves as a copy-paste template for implementation. | 3 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with explicit validation checkpoints. The contract table includes failure class mapping and escalation triggers, providing clear feedback loops for error recovery. | 3 / 3 |
Progressive Disclosure | Content is well-organized with clear sections, but the guardrails section (especially W011 mitigation) is quite lengthy inline. For a skill of this complexity, some content could be split into referenced files for better navigation. | 2 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Install with Tessl CLI
npx tessl i markusdowne/detectability-contractReviewed
Table of Contents