critique

Comprehensive multi-perspective review using specialized judges with debate and consensus building

Quality

19%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Fix and improve this skill with Tessl

tessl review fix ./plugins/reflexion/skills/critique/SKILL.md

Quality

Content

39%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides a well-structured multi-phase workflow with clear sequencing and validation checkpoints, which is its primary strength. However, it is extremely verbose — the three full judge prompt templates and the exhaustive report template are all inlined, creating a massive document that wastes context window. Much of the content explains patterns and concepts Claude already knows, and the actionable guidance remains at a descriptive rather than executable level.

Suggestions

Extract the three judge prompt templates and the report template into separate referenced files (e.g., judges/requirements-validator.md, templates/report.md) to dramatically reduce the main skill's token footprint and improve progressive disclosure.

Remove the <context> section explaining what Multi-Agent Debate and LLM-as-a-Judge patterns are — Claude already knows these concepts. Replace with a single sentence stating the approach.

Show a concrete Task tool invocation example with actual tool call syntax rather than just describing it abstractly as 'Use the Task tool to spawn three specialized judge agents in parallel'.

Cut the 'Important Guidelines' section — principles like 'Be Objective', 'Be Specific', 'Be Constructive' are generic advice Claude already follows and add no unique value.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~350+ lines. Massive prompt templates are spelled out in full for each judge, the report template is exhaustively detailed, and much of the content explains concepts Claude already understands (what multi-agent debate is, what Chain-of-Verification means, how to be objective). The context section explaining the pattern names adds no actionable value.	1 / 3
Actionability	The workflow is concrete in structure (phases, judge prompts, report template) and provides specific prompt templates, but it's ultimately pseudocode-level guidance — there are no executable code snippets, and the Task tool invocations are described abstractly ('Use the Task tool to spawn three specialized judge agents') rather than with concrete tool call syntax. The usage examples at the bottom are illustrative but not real executable commands.	2 / 3
Workflow Clarity	The four-phase workflow is clearly sequenced (Context Gathering → Independent Reviews → Cross-Review & Debate → Consensus Report) with explicit validation steps: Phase 1 includes scope confirmation, each judge uses Chain-of-Verification self-checks, Phase 3 includes a debate/resolution step for disagreements, and Phase 4 produces a structured report with prioritized action items. The feedback loop in Phase 3 (debate → consensus → unresolved notation) is well-defined.	3 / 3
Progressive Disclosure	The entire skill is a monolithic wall of text with no references to external files. The three full judge prompt templates, the complete report template, and all guidelines are inlined in a single massive document. This content would benefit enormously from splitting judge prompts and the report template into separate referenced files.	1 / 3
	Total	7 / 12 Passed

Description

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description is extremely vague and abstract, reading more like a process methodology label than a skill description. It fails to specify what is being reviewed, what domain it applies to, what concrete outputs it produces, and when Claude should select it. Without any of these details, it would be nearly impossible for Claude to correctly choose this skill from a list of alternatives.

Suggestions

Specify what is being reviewed (e.g., 'code', 'documents', 'proposals') and what concrete actions are performed (e.g., 'evaluates code quality, identifies bugs, checks security vulnerabilities').

Add an explicit 'Use when...' clause with natural trigger terms a user would say, such as 'Use when the user asks for a thorough review, critique, or multi-angle evaluation of [specific content type]'.

Clarify what 'specialized judges' and 'consensus building' mean in practical terms — e.g., 'Applies multiple evaluation lenses (security, performance, readability) and synthesizes findings into a unified assessment'.

Dimension	Reasoning	Score
Specificity	The description uses abstract language like 'comprehensive multi-perspective review' and 'specialized judges with debate and consensus building' without specifying what is being reviewed, what the judges evaluate, or what concrete actions are performed.	1 / 3
Completeness	The description only vaguely addresses 'what' (some kind of review process) and completely lacks a 'when' clause. There is no explicit trigger guidance for when Claude should select this skill.	1 / 3
Trigger Term Quality	The terms used ('multi-perspective review', 'specialized judges', 'debate', 'consensus building') are not natural keywords a user would say. A user would more likely say 'review my code', 'get feedback', or 'evaluate this document' rather than these abstract process-oriented terms.	1 / 3
Distinctiveness Conflict Risk	The description is so generic that 'review' could apply to code review, document review, design review, or any evaluation task. Without specifying the domain or type of content being reviewed, it could conflict with many other skills.	1 / 3
	Total	4 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: NeoLabHQ/context-engineering-kit
Path: plugins/reflexion/skills/critique/SKILL.md
Commit: 3711edf

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.