Challenge AI output with structured devil's-advocate protocols: anchor, verify, framing, and deep sub-commands.
86
86%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines its purpose (challenging and stress-testing AI output), provides extensive natural trigger terms that users would actually say, and includes helpful subcommand details. The description is concise yet comprehensive, with a well-defined niche that distinguishes it from other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: challenge, push back, play devil's advocate, and enumerates specific subcommands (anchor, verify, framing, deep) with brief explanations of each. | 3 / 3 |
Completeness | Clearly answers both 'what' (challenge/push back/devil's advocate on AI output) and 'when' (explicit 'Use when:' clause with extensive trigger phrases), plus includes subcommand details for additional context. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would actually say: 'are you sure', 'prove it', 'what if you're wrong', 'sanity check', 'too confident', 'really?', 'poke holes', 'second opinion'. These are highly natural phrases. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a very clear niche — meta-cognitive critique of AI output — that is unlikely to overlap with typical task-oriented skills. The trigger terms are distinctly about questioning and challenging rather than producing content. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
70%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured skill that serves effectively as a dispatch hub with clear routing logic, good error handling (AskUserQuestion guard), and excellent progressive disclosure to protocol files. Its main weaknesses are moderate verbosity in the When to Use/Not Use and Anti-Patterns sections, and the fact that actionable execution details are almost entirely delegated to external files, making the skill itself more of a router than a self-contained guide. The Thinking Transparency framework and structured output requirements add good value.
Suggestions
Trim the 'When to Use' and 'When Not to Use' sections to bullet points without the extended explanations — Claude can infer the reasoning from the context.
Consider inlining a minimal executable example for at least one subcommand (e.g., anchor) so the skill is partially self-contained even if protocol files are unavailable.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient but includes some sections that could be tightened. The 'When to Use' and 'When Not to Use' sections are somewhat verbose with explanations Claude could infer, and the Anti-Patterns section repeats 'Why:' justifications that add bulk. However, the dispatch table and core workflow are lean. | 2 / 3 |
Actionability | The skill provides a clear dispatch table and subcommand structure, but the actual execution logic is delegated to external reference files (protocols/*.md). The usage examples are illustrative comments rather than executable demonstrations. The 'deep' subcommand has concrete instructions (spawn sub-agent via Agent tool), but most subcommands just say 'read protocol file → execute' without inline fallback guidance. | 2 / 3 |
Workflow Clarity | The workflow is clearly sequenced: parse subcommand → dispatch to protocol or fallback to AskUserQuestion → execute protocol → produce Challenge Report. The AskUserQuestion guard provides an explicit error recovery loop for a known bug. The Thinking Transparency section provides a clear 4-step structure for findings. The deep subcommand has explicit instructions about fresh context. | 3 / 3 |
Progressive Disclosure | Excellent progressive disclosure structure. The SKILL.md serves as a clear overview/dispatcher with well-signaled one-level-deep references to protocol files (anchor.md, verify.md, framing.md) and a reference catalog (reference.md). Content is appropriately split between the overview and detailed protocol files. | 3 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
Reviewed
Table of Contents