Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis
47
36%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/sadd/skills/do-competitively/SKILL.mdKey features:
CRITICAL: You are not implementation agent or judge, you shoudn't read files that provided as context for sub-agent or task. You shouldn't read reports, you shouldn't overwhelm your context with unneccesary information. You MUST follow process step by step. Any diviations will be considered as failure and you will be killed!
This command implements a multi-phase adaptive competitive orchestration pattern:
Phase 1: Competitive Generation with Self-Critique + Meta-Judge (IN PARALLEL)
┌─ Meta-Judge → Evaluation Specification YAML ───────────┐
Task ────┼─ Agent 2 → Draft → Critique → Revise → Solution B ───┐ │
├─ Agent 3 → Draft → Critique → Revise → Solution C ───┼─┤
└─ Agent 1 → Draft → Critique → Revise → Solution A ───┘ │
│
Phase 2: Multi-Judge Evaluation with Verification │
┌─ Judge 1 → Evaluate → Verify → Revise → Report A ─┐ │
├─ Judge 2 → Evaluate → Verify → Revise → Report B ─┼────┤
└─ Judge 3 → Evaluate → Verify → Revise → Report C ─┘ │
│
Phase 2.5: Adaptive Strategy Selection │
Analyze Consensus ───────────────────────────────────────┤
├─ Clear Winner? → SELECT_AND_POLISH │
├─ All Flawed (<3.0)? → REDESIGN (return Phase 1) │
└─ Split Decision? → FULL_SYNTHESIS │
│ │
Phase 3: Evidence-Based Synthesis │ │
(Only if FULL_SYNTHESIS) │ │
Synthesizer ─────────────────────┴───────────────────────┴─→ Final SolutionBefore starting, ensure the reports directory exists:
mkdir -p .specs/reportsReport naming convention: .specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md
Where:
{solution-name} - Derived from output path (e.g., users-api from output specs/api/users.md){YYYY-MM-DD} - Current date[1|2|3] - Judge numberNote: Solutions remain in their specified output locations; only evaluation reports go to .specs/reports/
Launch 3 independent generator agents AND 1 meta-judge agent in parallel (4 agents total, all recommended: Opus for quality):
The meta-judge runs in parallel with the 3 generators because it does not need their output — it only needs the task description to generate evaluation criteria.
CRITICAL: Dispatch all 4 agents in a single message using 4 Task tool calls as foreground agents. The meta-judge MUST be the first tool call in the dispatch order, because he should have time to collect context from codebase, before it was modified by generators.
The meta-judge generates an evaluation specification YAML (rubrics, checklists, scoring criteria) tailored to this specific task. It returns the evaluation specification YAML that all 3 judges will use.
Prompt template for meta-judge:
## Task
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that judge agents will use to evaluate and compare competitive implementation artifacts.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task description from user}
## Context
{Any relevant codebase context, file paths, constraints}
## Artifact Type
{code | documentation | configuration | etc.}
## Number of Solutions
3 (competitive implementations to be compared)
## Instructions
Return only the final evaluation specification YAML in your response.
The specification should support comparative evaluation across multiple solutions.Dispatch:
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"{solution-file}.[a|b|c].[ext])Solution naming convention: {solution-file}.[a|b|c].[ext]
Where:
{solution-file} - Derived from task (e.g., create users.ts result in users as solution file)[a|b|c] - Unique identifier per sub-agent[ext] - File extension (e.g., md, ts and etc.)Key principle: Diversity through independence - agents explore different approaches.
CRITICAL: You MUST provide filename with [a|b|c] identifier to agents and judges!!! Missing it, will result in your TERMINATION imidiatly!
Prompt template for generators:
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<output>
{define expected output following such pattern: {solution-file}.[a|b|c].[ext] based on the task description and context. Each [a|b|c] is a unique identifier per sub-agent. You MUST provide filename with it!!!}
</output>
Instructions:
Let's approach this systematically to produce the best possible solution.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Consider multiple approaches - what are the different ways to solve this?
3. Think through the tradeoffs step by step and choose the approach you believe is best
4. Implement it completely
5. Generate 5 verification questions about critical aspects
6. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
7. Revise solution:
- Fix identified issues
8. Explain what was changed and whySend ALL 4 Task tool calls in a single message. Meta-judge first, then generators:
Message with 4 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (generator A):
- description: "Generate solution A: {brief task summary}"
- model: opus
Tool call 3 (generator B):
- description: "Generate solution B: {brief task summary}"
- model: opus
Tool call 4 (generator C):
- description: "Generate solution C: {brief task summary}"
- model: opusWait for ALL 4 to return before proceeding to Phase 2.
Launch 3 independent judges in parallel (recommended: Opus for rigor):
CRITICAL: Wait for ALL Phase 1 agents (meta-judge + 3 generators) to complete before dispatching judges.
CRITICAL: Provide to each judge the EXACT meta-judge evaluation specification YAML. Do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!
.specs/reports/{solution-name}-{date}.[1|2|3].md)Key principle: Multiple independent evaluations reduce bias and catch different issues.
Prompt template for judges:
You are evaluating {number} competitive solutions against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Task
{task_description}
## Solutions
{list of paths to all candidate solutions}
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}Write full report to: {.specs/reports/{solution-name}-{date}.[1|2|3].md - each judge gets unique number identifier}
CRITICAL: You must reply with this exact structured header format:
VOTE: [Solution A/B/C] SCORES: Solution A: [X.X]/5.0 Solution B: [X.X]/5.0 Solution C: [X.X]/5.0 CRITERIA:
[Summary of your evaluation]
Follow your full judge process as defined in your agent instructions!
CRITICAL: Base your evaluation on evidence, not impressions. Quote specific text.
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold to judges. Judge MUST not know what threshold for score is, in order to not be biased!!!
**Dispatch:**Use Task tool (3 calls in single message):
### Phase 2.5: Adaptive Strategy Selection (Early Return)
**The orchestrator** (not a subagent) analyzes judge outputs to determine the optimal strategy.
#### Decision Logic
**Step 1: Parse structured headers from judge reply**
Parse the judges reply.
CRITICAL: Do not read reports files itself, it can overflow your context.
**Step 2: Check for unanimous winner**
Compare all three VOTE values:
- If Judge 1 VOTE = Judge 2 VOTE = Judge 3 VOTE (same solution):
- **Strategy: SELECT_AND_POLISH**
- **Reason:** Clear consensus - all three judges prefer same solution
**Step 3: Check if all solutions are fundamentally flawed**
If no unanimous vote, calculate average scores:
1. Average Solution A scores: (Judge1_A + Judge2_A + Judge3_A) / 3
2. Average Solution B scores: (Judge1_B + Judge2_B + Judge3_B) / 3
3. Average Solution C scores: (Judge1_C + Judge2_C + Judge3_C) / 3
If (avg_A < 3.0) AND (avg_B < 3.0) AND (avg_C < 3.0):
- **Strategy: REDESIGN**
- **Reason:** All solutions below quality threshold, fundamental approach issues
**Step 5: Default to full synthesis**
If none of the above conditions met:
- **Strategy: FULL_SYNTHESIS**
- **Reason:** Split decision with merit, synthesis needed to combine best elements
#### Strategy 1: SELECT_AND_POLISH
**When:** Clear winner (unanimous votes)
**Process:**
1. Select the winning solution as the base
2. Launch subagent to apply specific improvements from judge feedback
3. Cherry-pick 1-2 best elements from runner-up solutions
4. Document what was added and why
**Benefits:**
- Saves synthesis cost (simpler than full synthesis)
- Preserves proven quality of winning solution
- Focused improvements rather than full reconstruction
**Prompt template:**
```markdown
You are polishing the winning solution based on judge feedback.
<task>
{task_description}
</task>
<winning_solution>
{path_to_winning_solution}
Score: {winning_score}/5.0
Judge consensus: {why_it_won}
</winning_solution>
<runner_up_solutions>
{list of paths to all runner-up solutions}
</runner_up_solutions>
<judge_feedback>
{list of paths to all evaluation reports}
</judge_feedback>
<output>
{final_solution_path}
</output>
Instructions:
Let's work through this step by step to polish the winning solution effectively.
1. Take the winning solution as your base (do NOT rewrite it)
2. First, carefully review all judge feedback to understand what needs improvement
3. Apply improvements based on judge feedback:
- Fix identified weaknesses
- Add missing elements judges noted
4. Next, examine the runner-up solutions for standout elements
5. Cherry-pick 1-2 specific elements from runners-up if judges praised them
6. Document changes made:
- What was changed and why
- What was added from other solutions
CRITICAL: Preserve the winning solution's core approach. Make targeted improvements only.When: All solutions scored <3.0/5.0 (fundamental issues across the board)
Process:
Prompt template for new implementation:
You are analyzing why all solutions failed to meet quality standards. And implement new solution based on it.
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<failed_solutions>
{list of paths to all candidate solutions}
</failed_solutions>
<evaluation_reports>
{list of paths to all evaluation reports with low scores}
</evaluation_reports>
Instructions:
Let's break this down systematically to understand what went wrong and how to design new solution based on it.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Read through each solution and its evaluation report
3. For each solution, think step by step about:
- What was the core approach?
- What specific issues did judges identify?
- Why did this approach fail to meet the quality threshold?
4. Identify common failure patterns across all solutions:
- Are there shared misconceptions?
- Are there missing requirements that all solutions overlooked?
- Are there fundamental constraints that weren't considered?
5. Extract lessons learned:
- What approaches should be avoided?
- What constraints must be addressed?
6. Generate improved guidance for the next iteration:
- New constraints to add
- Specific approaches to try - what are the different ways to solve this?
- Key requirements to emphasize
7. Think through the tradeoffs step by step and choose the approach you believe is best
8. Implement it completely
9. Generate 5 verification questions about critical aspects
10. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
11. Revise solution:
- Fix identified issues
12. Explain what was changed and whyWhen: No clear winner AND solutions have merit (scores >=3.0)
Process: Proceed to Phase 3 (Evidence-Based Synthesis)
Only executed when Strategy 3 (FULL_SYNTHESIS) selected in Phase 2.5
Launch 1 synthesis agent (recommended: Opus for quality):
Key principle: Evidence-based synthesis leverages collective intelligence.
Prompt template for synthesizer:
You are synthesizing the best solution from competitive implementations and evaluations.
<task>
{task_description}
</task>
<solutions>
{list of paths to all candidate solutions}
</solutions>
<evaluation_reports>
{list of paths to all evaluation reports}
</evaluation_reports>
<output>
{define expected output following such pattern: solution.md based on the task description and context. Result should be a complete solution to the task.}
</output>
Instructions:
Let's think through this synthesis step by step to create the best possible combined solution.
1. First, read all solutions and evaluation reports carefully
2. Map out the consensus:
- What strengths did multiple judges praise in each solution?
- What weaknesses did multiple judges criticize in each solution?
3. For each major component or section, think through:
- Which solution handles this best and why?
- Could a hybrid approach work better?
4. Create the best possible solution by:
- Copying text directly when one solution is clearly superior
- Combining approaches when a hybrid would be better
- Fixing all identified issues
- Preserving the best elements from each
5. Explain your synthesis decisions:
- What you took from each solution
- Why you made those choices
- How you addressed identified weaknesses
CRITICAL: Do not create something entirely new. Synthesize the best from what exists.{solution-file}.[a|b|c].[ext] (in specified output location).specs/reports/{solution-name}-{date}.[1|2|3].md{output_path}Once command execution is complete, reply to user with following structure:
## Execution Summary
Original Task: {task_description}
Strategy Used: {strategy} ({reason})
### Results
| Phase | Agents | Models | Status |
|-------------------------|--------|----------|-------------|
| Phase 1: Competitive Generation + Meta-Judge | 4 (3 generators + 1 meta-judge) | opus x 4 | [Complete / Failed] |
| Phase 2: Multi-Judge Evaluation | 3 | opus x 3 | [Complete / Failed] |
| Phase 2.5: Adaptive Strategy Selection | orchestrator | - | {strategy} |
| Phase 3: [Synthesis/Polish/Redesign] | [N] | [model] | [Complete / Failed] |
Files Created
Final Solution:
- {output_path} - Synthesized production-ready command
Candidate Solutions:
- {solution-file}.[a|b|c].[ext] (Score: [X.X]/5.0)
Evaluation Reports:
- .specs/reports/{solution-file}-{date}.[1|2|3].md (Vote: [Solution A/B/C])
Synthesis Decisions
| Element | Source | Rationale |
|----------------------|------------------|-------------|
| [element] | Solution [B/A/C] | [rationale] |Do:
/do-competitively "Design REST API for user management (CRUD + auth)" \
--output "specs/api/users.md" \
--criteria "RESTfulness,security,scalability,developer-experience"Phase 1 outputs (4 parallel agents):
specs/api/users.a.md - Resource-based design with nested routesspecs/api/users.b.md - Action-based design with RPC-style endpointsspecs/api/users.c.md - Minimal design, missing auth considerationPhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/users-api-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=4.5/5.0, B=3.2/5.0, C=2.8/5.0"Most RESTful, good security"
.specs/reports/users-api-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.3/5.0, B=3.5/5.0, C=2.6/5.0"Clean resource design, scalable"
.specs/reports/users-api-2025-01-15.3.md:
VOTE: Solution A
SCORES: A=4.6/5.0, B=3.0/5.0, C=2.9/5.0"Best practices, clear structure"
Phase 2.5 decision (orchestrator parses headers):
Phase 3 output:
specs/api/users.md - Solution A polished with:
/do-competitively "Design caching strategy for high-traffic API" \
--output "specs/caching.md" \
--criteria "performance,memory-efficiency,simplicity,reliability"Phase 1 outputs (4 parallel agents):
specs/caching.a.md - Redis with LRU evictionspecs/caching.b.md - Multi-tier cache (memory + Redis)specs/caching.c.md - CDN + application cachePhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/caching-2025-01-15.1.md:
VOTE: Solution B
SCORES: A=3.8/5.0, B=4.2/5.0, C=3.9/5.0"Best performance, complex"
.specs/reports/caching-2025-01-15.2.md:
VOTE: Solution A
SCORES: A=4.0/5.0, B=3.9/5.0, C=3.7/5.0"Simple, reliable, proven"
.specs/reports/caching-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=3.6/5.0, B=4.0/5.0, C=4.1/5.0"Global reach, cost-effective"
Phase 2.5 decision (orchestrator parses headers):
Phase 3 output:
specs/caching.md - Hybrid approach:
/do-competitively "Design authentication system with social login" \
--output "specs/auth.md" \
--criteria "security,user-experience,maintainability"Phase 1 outputs (4 parallel agents):
specs/auth.a.md - Custom OAuth2 implementationspecs/auth.b.md - Session-based with social providersspecs/auth.c.md - JWT with password-only authPhase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
.specs/reports/auth-2025-01-15.1.md:
VOTE: Solution A
SCORES: A=2.5/5.0, B=2.2/5.0, C=2.3/5.0"Security risks, reinventing wheel"
.specs/reports/auth-2025-01-15.2.md:
VOTE: Solution B
SCORES: A=2.4/5.0, B=2.8/5.0, C=2.1/5.0"Sessions don't scale, missing requirements"
.specs/reports/auth-2025-01-15.3.md:
VOTE: Solution C
SCORES: A=2.6/5.0, B=2.5/5.0, C=2.3/5.0"No social login, security concerns"
Phase 2.5 decision (orchestrator parses headers):
Split votes: A, B, C (no consensus)
Average scores: A=2.5, B=2.5, C=2.2 (ALL <3.0)
Strategy: REDESIGN
Reason: All solutions below 3.0 threshold, fundamental issues
Do not stop, return to phase 1 and eventiualy should result in finish at SELECT_AND_POLISH or FULL_SYNTHESIS strategies
</output>dedca19
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.