"Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships."
38
38%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
7%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is heavily laden with technical jargon and buzzwords while failing to communicate concrete actions, applicable domains, or trigger conditions. It reads more like an architecture note than a skill description, providing no guidance for when Claude should select it. A user needing quality review or verification would never use these terms naturally.
Suggestions
Replace abstract jargon with concrete actions: specify what is being verified (e.g., 'Verifies code output quality by running two independent review passes checking for correctness and edge cases').
Add an explicit 'Use when...' clause with natural trigger terms users would actually say, such as 'Use when the user asks for thorough review, double-checking, or high-confidence verification of generated output'.
Specify the domain or type of output this applies to (code, documents, data analysis, etc.) to make the skill distinguishable and actionable.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses abstract, buzzword-heavy language like 'multi-agent adversarial verification' and 'convergence loop' without specifying concrete actions. It does not explain what is being verified, what kind of output is being reviewed, or what domain this applies to. | 1 / 3 |
Completeness | The 'what' is vaguely described with abstract process language, and there is no 'when' clause or any explicit trigger guidance for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | The terms used ('multi-agent', 'adversarial verification', 'convergence loop') are technical jargon that users would almost never naturally say. There are no natural trigger terms a user would use when needing this skill. | 1 / 3 |
Distinctiveness Conflict Risk | The jargon-heavy language is somewhat distinctive in that few other skills would use terms like 'adversarial verification' or 'convergence loop,' but the lack of domain specificity means it's unclear what this skill actually covers, making it hard to distinguish from other review/validation skills. | 2 / 3 |
Total | 5 / 12 Passed |
Implementation
39%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent workflow clarity with a well-defined 4-phase process, convergence loop, and clear escalation path. However, it is severely bloated — much of the content (domain-specific rubrics, cost analysis, metrics, concept explanations) is either generic knowledge Claude already has or belongs in separate reference files. The actionability is moderate since the code is pseudocode rather than executable patterns for actual Claude Code usage.
Suggestions
Cut the content by 50-60%: remove the 'core insight' explanation, cost analysis section, metrics section, and domain-specific rubric extension tables — these are generic knowledge Claude already possesses.
Split into SKILL.md (overview + 4 phases briefly) with references to RUBRIC_TEMPLATES.md, BATCH_PATTERN.md, and FAILURE_MODES.md for detailed content.
Replace pseudocode with actual Claude Code Agent tool invocation syntax showing real tool_use patterns rather than fictional Python functions.
Remove the 'When to Activate' and 'When NOT to use' sections — Claude can infer appropriate contexts from the skill description and architecture.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines. It explains concepts Claude already knows (why independent review matters, what anchoring bias is, cost-benefit analysis), includes extensive tables of domain-specific rubric extensions that are generic knowledge, and pads with motivational rationale ('the core insight') that doesn't add actionable value. The cost analysis section and metrics section are largely filler. | 1 / 3 |
Actionability | The skill provides structured pseudocode and prompt templates that are somewhat concrete, but nothing is truly executable — it's all pseudocode with fictional functions (generate(), Agent(), fix_agent.execute(), ship(), escalate_to_human()). The reviewer prompt template is the most actionable element, but the actual implementation for Claude Code subagents is hand-waved ('use the Agent tool to spawn reviewers'). | 2 / 3 |
Workflow Clarity | The multi-step workflow is clearly sequenced across 4 phases with an explicit convergence loop, max iteration cap, escalation path, and the critical invariant that fresh agents must be used each round. The ASCII diagram, verdict gate logic, and fix cycle with explicit termination conditions provide strong workflow clarity with validation checkpoints. | 3 / 3 |
Progressive Disclosure | This is a monolithic wall of text with no references to external files. Domain-specific rubric extensions, batch sampling patterns, failure mode tables, metrics, cost analysis, and integration notes are all inlined when they could be split into separate reference files. The skill would benefit enormously from being an overview that points to detailed materials. | 1 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents