CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/challenge

Challenge AI output with structured devil's-advocate protocols: anchor, verify, framing, and deep sub-commands.

86

Quality

86%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly defines its purpose (challenging and stress-testing AI output), provides extensive natural trigger terms that users would actually say, and includes helpful subcommand details. The description is concise yet comprehensive, with a well-defined niche that distinguishes it from other skills.

DimensionReasoningScore

Specificity

Lists multiple concrete actions: challenge, push back, play devil's advocate, and enumerates specific subcommands (anchor, verify, framing, deep) with brief explanations of each.

3 / 3

Completeness

Clearly answers both 'what' (challenge/push back/devil's advocate on AI output) and 'when' (explicit 'Use when:' clause with extensive trigger phrases), plus includes subcommand details for additional context.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms users would actually say: 'are you sure', 'prove it', 'what if you're wrong', 'sanity check', 'too confident', 'really?', 'poke holes', 'second opinion'. These are highly natural phrases.

3 / 3

Distinctiveness Conflict Risk

Occupies a very clear niche — meta-cognitive critique of AI output — that is unlikely to overlap with typical task-oriented skills. The trigger terms are distinctly about questioning and challenging rather than producing content.

3 / 3

Total

12

/

12

Passed

Implementation

70%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured skill that serves effectively as a dispatch hub with clear routing logic, good error handling (AskUserQuestion guard), and excellent progressive disclosure to protocol files. Its main weaknesses are moderate verbosity in the When to Use/Not Use and Anti-Patterns sections, and the fact that actionable execution details are almost entirely delegated to external files, making the skill itself more of a router than a self-contained guide. The Thinking Transparency framework and structured output requirements add good value.

Suggestions

Trim the 'When to Use' and 'When Not to Use' sections to bullet points without the extended explanations — Claude can infer the reasoning from the context.

Consider inlining a minimal executable example for at least one subcommand (e.g., anchor) so the skill is partially self-contained even if protocol files are unavailable.

DimensionReasoningScore

Conciseness

The skill is mostly efficient but includes some sections that could be tightened. The 'When to Use' and 'When Not to Use' sections are somewhat verbose with explanations Claude could infer, and the Anti-Patterns section repeats 'Why:' justifications that add bulk. However, the dispatch table and core workflow are lean.

2 / 3

Actionability

The skill provides a clear dispatch table and subcommand structure, but the actual execution logic is delegated to external reference files (protocols/*.md). The usage examples are illustrative comments rather than executable demonstrations. The 'deep' subcommand has concrete instructions (spawn sub-agent via Agent tool), but most subcommands just say 'read protocol file → execute' without inline fallback guidance.

2 / 3

Workflow Clarity

The workflow is clearly sequenced: parse subcommand → dispatch to protocol or fallback to AskUserQuestion → execute protocol → produce Challenge Report. The AskUserQuestion guard provides an explicit error recovery loop for a known bug. The Thinking Transparency section provides a clear 4-step structure for findings. The deep subcommand has explicit instructions about fresh context.

3 / 3

Progressive Disclosure

Excellent progressive disclosure structure. The SKILL.md serves as a clear overview/dispatcher with well-signaled one-level-deep references to protocol files (anchor.md, verify.md, framing.md) and a reference catalog (reference.md). Content is appropriately split between the overview and detailed protocol files.

3 / 3

Total

10

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Reviewed

Table of Contents