Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".
84
94%
Does it follow best practices?
Impact
81%
1.30xAverage score across 8 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates its adversarial verification purpose, lists concrete actions it performs, and provides explicit trigger guidance with natural user phrases. It uses proper third-person voice throughout and occupies a distinct niche that would be easy for Claude to differentiate from other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'targeted tests, edge-case repros, command runs, or API calls', 'try to break the claim', 'look for hidden regressions', and 'return PASS, PARTIAL, or FAIL with evidence'. These are concrete, actionable capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (run checks, try to break claims, look for regressions, return PASS/PARTIAL/FAIL with evidence) and 'when' (explicitly starts with 'Use when the user wants an adversarial double-check' and lists specific trigger phrases). | 3 / 3 |
Trigger Term Quality | Includes excellent natural trigger phrases users would actually say: 'poke holes in this', 'stress test this change', 'double check this fix', 'try to break it', plus domain terms like 'adversarial double-check', 'code or config change', 'hidden regressions'. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche as an adversarial verification/stress-testing skill, distinct from general code review or testing skills. The specific framing around 'breaking' claims and returning PASS/PARTIAL/FAIL verdicts makes it unlikely to conflict with standard code review or testing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, concise skill that clearly defines an adversarial verification workflow with explicit verdict criteria and good guardrails (e.g., PARTIAL when environment blocks strong verification). Its main weakness is that actionability stays at the instructional/process level—it tells Claude what to do conceptually but doesn't provide concrete executable examples of verification commands, test snippets, or tool invocations that would make the 'run the strongest checks' step more concrete.
Suggestions
Add 1-2 concrete executable examples showing actual verification commands (e.g., running a specific test suite, crafting an edge-case input, using curl to hit an endpoint) to make step 4 more actionable.
Expand the mini example into a full worked example showing the complete output format populated with realistic evidence and attempts, so Claude has a concrete template to follow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section earns its place. No unnecessary explanations of what adversarial testing is or why it matters. The rules section is tight and each bullet adds a distinct constraint. | 3 / 3 |
Actionability | The workflow provides clear steps and the output format is concrete, but the guidance remains at the instructional level without executable code or specific commands. Steps like 'Run the strongest checks available' are somewhat vague—what tools, what commands? The mini example helps but is brief and not fully fleshed out. | 2 / 3 |
Workflow Clarity | The 5-step workflow is clearly sequenced with a falsification bias explicitly stated. The verdict system (PASS/PARTIAL/FAIL) serves as a validation checkpoint, and the rules include explicit guidance for when verification is blocked or incomplete (return PARTIAL, not PASS), which functions as a feedback loop for uncertain outcomes. | 3 / 3 |
Progressive Disclosure | For a skill under 50 lines with a single purpose, the content is well-organized into clear sections (Goal, Workflow, Output Format, Rules) with no need for external references. The structure supports easy scanning and discovery. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents