CtrlK
BlogDocsLog inGet started
Tessl Logo

judge-with-debate

Evaluate solutions through multi-round debate between independent judges until consensus

45

Quality

47%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/sadd/skills/judge-with-debate/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides an extremely thorough and actionable guide for multi-agent debate-based evaluation, with excellent workflow clarity including decision points, consensus criteria, and feedback loops. However, it is severely over-verbose — repeating the same information across the process description, orchestration instructions, best practices, and example sections. The content would benefit greatly from aggressive deduplication and splitting into supporting files.

Suggestions

Reduce content by at least 50% by eliminating redundancy: the orchestration steps (Steps 1-7) largely duplicate the Phase descriptions above them — merge into a single authoritative sequence.

Move prompt templates into separate referenced files (e.g., META_JUDGE_PROMPT.md, JUDGE_PROMPT.md, DEBATE_PROMPT.md) to reduce inline bulk and improve progressive disclosure.

Remove explanatory text Claude already knows (e.g., 'Independence in initial analysis prevents groupthink', 'Garbage in, garbage out', explanations of what debate achieves) — these waste tokens without adding actionable guidance.

Consolidate the 'Best Practices', 'Common Pitfalls', and 'Do This' sections into a single compact checklist — currently they repeat constraints already stated in the process sections.

DimensionReasoningScore

Conciseness

Extremely verbose at ~350+ lines. Massive amounts of repetition (the orchestration steps essentially restate the phases, the example walkthrough restates the process again, best practices restate constraints already mentioned). Explains concepts Claude already knows (what debate is, what consensus means, what independence prevents groupthink). The meta-judge prompt template, judge prompt template, and debate prompt template could be much more compact.

1 / 3

Actionability

Highly actionable with concrete prompt templates, specific Task tool dispatch instructions, exact file naming conventions, precise consensus criteria (0.5 points overall, 1 point per criterion), and a detailed worked example showing the full flow. Claude would know exactly what to do at each step.

3 / 3

Workflow Clarity

Excellent multi-step workflow with clear sequencing (Phase 0 → 0.5 → 1 → 2 → 3), explicit decision points (consensus check with specific numeric thresholds), feedback loops (debate rounds with re-check), and clear termination conditions (max 3 rounds or consensus). The ASCII diagram and numbered orchestration steps make the flow unambiguous.

3 / 3

Progressive Disclosure

No bundle files are provided, so everything is inline in one massive file. The content would benefit from splitting prompt templates, the example walkthrough, and best practices into separate referenced files. The structure within the file is reasonable with clear sections, but the sheer volume of inline content makes it a borderline monolithic wall of text.

2 / 3

Total

9

/

12

Passed

Description

32%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description conveys a unique evaluation methodology (multi-round debate with judges reaching consensus) but is too terse and lacks explicit trigger guidance. It doesn't specify what types of solutions it evaluates, what the output looks like, or when Claude should choose this skill over other evaluation approaches.

Suggestions

Add a 'Use when...' clause with natural trigger terms, e.g., 'Use when the user wants to rigorously compare multiple solution options, needs adversarial evaluation, or asks for a debate-style assessment of alternatives.'

List specific concrete actions such as 'Assigns independent judge perspectives, runs structured debate rounds, synthesizes scoring rationale, and produces a final consensus recommendation.'

Clarify the domain or scope—what kinds of 'solutions' does this evaluate (code solutions, design proposals, strategic options)? This would reduce conflict risk with other evaluation skills.

DimensionReasoningScore

Specificity

It names a specific approach ('multi-round debate between independent judges until consensus') and a general action ('evaluate solutions'), but doesn't list multiple concrete actions or detail what kinds of solutions, what the output looks like, or what steps are involved.

2 / 3

Completeness

The description answers 'what' at a high level (evaluate solutions through debate) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and the 'what' is also weak, so this scores a 1.

1 / 3

Trigger Term Quality

Terms like 'evaluate', 'debate', 'judges', and 'consensus' are somewhat relevant but not strongly natural user terms. Users are more likely to say things like 'compare options', 'review approaches', or 'which solution is best' rather than 'multi-round debate between judges'.

2 / 3

Distinctiveness Conflict Risk

The 'multi-round debate between independent judges' mechanism is somewhat distinctive, but 'evaluate solutions' is generic enough to overlap with any evaluation, comparison, or decision-making skill. The lack of domain specificity increases conflict risk.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
NeoLabHQ/context-engineering-kit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.