Comprehensive multi-perspective review using specialized judges with debate and consensus building
30
13%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/reflexion/skills/critique/SKILL.mdQuality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is extremely vague and abstract, failing to communicate what specifically is being reviewed, what the 'specialized judges' evaluate, or when this skill should be triggered. It reads like a high-level concept rather than an actionable skill description, making it nearly impossible for Claude to correctly select this skill from a pool of alternatives.
Suggestions
Specify what is being reviewed (e.g., 'code', 'documents', 'proposals') and list concrete actions (e.g., 'evaluates code quality, checks for security vulnerabilities, assesses performance implications').
Add an explicit 'Use when...' clause with natural trigger terms users would actually say (e.g., 'Use when the user asks for a thorough code review, wants multiple perspectives on their design, or mentions wanting a comprehensive evaluation').
Clarify the domain to distinguish this from other review-related skills and reduce conflict risk (e.g., 'Reviews pull requests using multiple evaluation criteria including correctness, style, and security').
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language like 'comprehensive multi-perspective review' and 'specialized judges' without specifying what is being reviewed, what the judges evaluate, or what concrete actions are performed. No specific capabilities are listed. | 1 / 3 |
Completeness | The description weakly addresses 'what' with vague language and completely lacks a 'when' clause. There is no 'Use when...' guidance or any explicit trigger conditions for when Claude should select this skill. | 1 / 3 |
Trigger Term Quality | The terms used ('multi-perspective review', 'specialized judges', 'debate and consensus building') are abstract and jargon-heavy. Users are unlikely to naturally say any of these phrases when requesting help. There are no natural trigger keywords like 'code review', 'document review', or specific domain terms. | 1 / 3 |
Distinctiveness Conflict Risk | The description is so generic that 'review' could apply to code review, document review, PR review, or any evaluation task. Without specifying the domain or subject matter, it could easily conflict with many other review-related skills. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
27%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill attempts an ambitious multi-agent review pattern but suffers from extreme verbosity—full prompt templates for three judges and a complete report template are all inlined, making it a ~350+ line monolith. While the workflow structure is reasonable, it lacks validation checkpoints and the actionability is undermined by unresolved template variables and CLI examples that aren't actually implemented. The content would benefit enormously from splitting judge prompts and report templates into separate files.
Suggestions
Extract the three judge prompt templates and the report template into separate bundle files (e.g., judges/requirements-validator.md, judges/solution-architect.md, judges/code-quality.md, templates/report.md) and reference them from the main skill.
Remove the <context> section explaining what Multi-Agent Debate and LLM-as-a-Judge patterns are—Claude doesn't need this conceptual background to execute the workflow.
Add explicit validation checkpoints: verify judge output structure before proceeding to Phase 3, and add a gate after Phase 1 scope confirmation before spawning judges.
Resolve the template variable strategy—clarify how {requirements}, {summary of changes}, etc. get populated from the context gathering phase, or provide a concrete code example of Task tool invocation with actual parameter passing.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~350+ lines. Massive prompt templates are spelled out in full for each judge, the report template is exhaustively detailed, and much of the content explains concepts Claude already understands (what multi-agent debate is, what Chain-of-Verification means, how to be objective). The context section explaining the pattern names adds no actionable value. | 1 / 3 |
Actionability | The workflow is concrete in structure (4 phases, specific judge prompts, report template), and it mentions using the Task tool with subagent_type. However, the actual implementation details are incomplete—there's no executable code, the template variables like {requirements} are placeholders without clear resolution, and the usage examples (bash commands with /critique) suggest a CLI interface that isn't actually implemented anywhere. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced and the debate/consensus phase includes a feedback loop. However, there are no validation checkpoints—no step verifies that judge outputs conform to expected structure, no error handling if a judge agent fails or returns malformed output, and Phase 1's scope confirmation doesn't have a clear gate before proceeding. | 2 / 3 |
Progressive Disclosure | This is a monolithic wall of text with no bundle files or external references. The three full judge prompt templates, the complete report template, and all guidelines are inlined in a single massive file. The judge prompts and report template should clearly be in separate referenced files to keep the main skill scannable. | 1 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
dedca19
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.