Refine, parallelize, and verify a draft task specification into a fully planned implementation-ready task
50
40%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/sdd/skills/plan-task/SKILL.mdYou are a task refinement orchestrator. Take a draft task file created by /add-task and refine it through a coordinated multi-agent workflow with quality gates after each phase.
This workflow command refines an existing draft task through:
draft/ to todo/All phases include judge validation to prevent error propagation and ensure quality thresholds are met.
$ARGUMENTSParse the following arguments from $ARGUMENTS:
| Argument | Format | Default | Description |
|---|---|---|---|
task-file | Path to task file | Required | Path to draft task file (e.g., .specs/tasks/draft/add-validation.feature.md) |
--continue | --continue [stage] | None | Continue refining from a specific stage. Stage is optional - resolve from context if not provided. |
--target-quality | --target-quality X.X | 3.5 | Target threshold value (out of 5.0) for judge pass/fail decisions. |
--max-iterations | --max-iterations N | 3 | Maximum implementation + judge retry cycles per phase before moving to next stage (regardless of pass/fail). |
--included-stages | --included-stages stage1,stage2,... | All stages | Comma-separated list of stages to include. |
--skip | --skip stage1,stage2,... | None | Comma-separated list of stages to exclude. |
--fast | --fast | N/A | Alias for --target-quality 3.0 --max-iterations 1 --included-stages business analysis,decomposition,verifications |
--one-shot | --one-shot | N/A | Alias for --included-stages business analysis,decomposition --skip-judges - minimal refinement without quality gates. |
--human-in-the-loop | --human-in-the-loop phase1,phase2,... | None | Phases after which to pause for human verification. |
--skip-judges | --skip-judges | false | Skip all judge validation checks - phases proceed without quality gates. |
--refine | --refine | false | Incremental refinement mode - detect changes against git and re-run only affected stages (top-to-bottom propagation). |
--included-stages / --skip)| Stage Name | Phase | Description |
|---|---|---|
research | 2a | Gather relevant resources, documentation, libraries |
codebase analysis | 2b | Identify affected files, interfaces, integration points |
business analysis | 2c | Refine description and create acceptance criteria |
architecture synthesis | 3 | Synthesize research and analysis into architecture |
decomposition | 4 | Break into implementation steps with risks |
parallelize | 5 | Reorganize steps for parallel execution |
verifications | 6 | Add LLM-as-Judge verification rubrics |
Parse $ARGUMENTS and resolve configuration as follows:
# Extract task file path (first positional argument, required)
TASK_FILE = first argument that is a file path (must exist in .specs/tasks/draft/)
# Parse alias flags first (they set multiple defaults)
if --fast present:
THRESHOLD = 3.0
MAX_ITERATIONS = 1
INCLUDED_STAGES = ["business analysis", "decomposition", "verifications"]
if --one-shot present:
INCLUDED_STAGES = ["business analysis", "decomposition"]
SKIP_JUDGES = true
# Initialize defaults
THRESHOLD ?= --target-quality || 3.5
MAX_ITERATIONS ?= --max-iterations || 3
INCLUDED_STAGES ?= --included-stages || ["research", "codebase analysis", "business analysis", "architecture synthesis", "decomposition", "parallelize", "verifications"]
SKIP_STAGES = --skip || []
HUMAN_IN_THE_LOOP_PHASES = --human-in-the-loop || []
SKIP_JUDGES = --skip-judges || false
REFINE_MODE = --refine || false
CONTINUE_STAGE = null
if --continue [stage] present:
CONTINUE_STAGE = stage or resolve from context
# Compute final active stages
ACTIVE_STAGES = INCLUDED_STAGES - SKIP_STAGES--continueWhen --continue is used without explicit stage:
[x] checkboxes)--refine)When --refine is used:
Change Detection:
git status --porcelain -- <TASK_FILE>git diff HEAD -- <TASK_FILE>
// comment markers indicating user feedback/correctionsTop-to-Bottom Propagation:
Section-to-Stage Mapping:
| Modified Section | Re-run From Stage |
|---|---|
| Description / Acceptance Criteria | business analysis (Phase 2c) |
| Architecture Overview | architecture synthesis (Phase 3) |
| Implementation Process / Steps | decomposition (Phase 4) |
| Parallelization / Dependencies | parallelize (Phase 5) |
| Verification sections | verifications (Phase 6) |
Refine Execution:
// comments as additional context to agentsExample:
# User edited the Architecture Overview section
/plan .specs/tasks/todo/my-task.feature.md --refine
# Detects Architecture section changed → re-runs from Phase 3 onwards
# Skips: research, codebase analysis, business analysis
# Runs: architecture synthesis, decomposition, parallelize, verificationsHuman verification checkpoints occur:
Trigger Conditions:
HUMAN_IN_THE_LOOP_PHASESAt Checkpoint:
Checkpoint Message Format:
---
## 🔍 Human Review Checkpoint - Phase X
**Phase:** {phase name}
**Judge Score:** {score}/{THRESHOLD} threshold
**Status:** ✅ PASS / ⚠️ RETRY {n}/{MAX_ITERATIONS}
**Artifacts:**
- {artifact_path_1}
- {artifact_path_2}
**Judge Feedback:**
{feedback summary}
**Action Required:** Review the above artifacts and provide feedback or continue.
> Continue? [Y/n/feedback]:
---# Refine a draft task with all stages
/plan .specs/tasks/draft/add-validation.feature.md
# Fast refinement with minimal stages
/plan .specs/tasks/draft/quick-fix.bug.md --fast
# Continue from a specific stage
/plan .specs/tasks/draft/complex-feature.feature.md --continue decomposition
# High-quality refinement with checkpoints
/plan .specs/tasks/draft/critical-api.feature.md --target-quality 4.5 --human-in-the-loop 2,3,4,5,6
# Incremental refinement after user edits (re-runs only affected stages)
/plan .specs/tasks/todo/my-task.feature.md --refineBefore starting workflow:
Validate task file exists:
REFINE_MODE is false: Check that TASK_FILE exists in .specs/tasks/draft/REFINE_MODE is true: Check that TASK_FILE exists in .specs/tasks/todo/ or .specs/tasks/draft/Parse and display resolved configuration:
### Configuration
| Setting | Value |
|---------|-------|
| **Task File** | {TASK_FILE} |
| **Target Quality** | {THRESHOLD}/5.0 |
| **Max Iterations** | {MAX_ITERATIONS} |
| **Active Stages** | {ACTIVE_STAGES as comma-separated list} |
| **Human Checkpoints** | Phase {HUMAN_IN_THE_LOOP_PHASES as comma-separated} |
| **Skip Judges** | {SKIP_JUDGES} |
| **Refine Mode** | {REFINE_MODE} |
| **Continue From** | {CONTINUE_STAGE} or "Start" |Handle --continue mode:
If CONTINUE_STAGE is set:
CONTINUE_STAGE (or auto-detected next incomplete stage)Handle --refine mode:
If REFINE_MODE is true:
git status --porcelain -- <TASK_FILE>
M (staged) or M (unstaged) or MM (both) → proceed with diff?? (untracked) → error: "File not tracked by git, cannot detect changes"git diff HEAD -- <TASK_FILE> to get all changes (staged + unstaged) vs last commit// comment markers as user feedbackACTIVE_STAGES to include only stages from the determined starting point onwardsExtract task info from file:
Initialize workflow progress tracking using TodoWrite:
Only include todos for phases in ACTIVE_STAGES. If continuing, mark completed phases as completed.
{
"todos": [
{"content": "Ensure directories exist", "status": "pending", "activeForm": "Ensuring directories exist"},
{"content": "Phase 2a: Research relevant resources and documentation", "status": "pending", "activeForm": "Researching resources"},
{"content": "Judge 2a: PASS research quality (> {THRESHOLD})", "status": "pending", "activeForm": "Validating research"},
{"content": "Phase 2b: Analyze codebase impact and affected files", "status": "pending", "activeForm": "Analyzing codebase impact"},
{"content": "Judge 2b: PASS codebase analysis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating codebase analysis"},
{"content": "Phase 2c: Business analysis and acceptance criteria", "status": "pending", "activeForm": "Analyzing business requirements"},
{"content": "Judge 2c: PASS business analysis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating business analysis"},
{"content": "Phase 3: Architecture synthesis from research and analysis", "status": "pending", "activeForm": "Synthesizing architecture"},
{"content": "Judge 3: PASS architecture synthesis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating architecture"},
{"content": "Phase 4: Decompose into implementation steps", "status": "pending", "activeForm": "Decomposing into steps"},
{"content": "Judge 4: PASS decomposition (> {THRESHOLD})", "status": "pending", "activeForm": "Validating decomposition"},
{"content": "Phase 5: Parallelize implementation steps", "status": "pending", "activeForm": "Parallelizing steps"},
{"content": "Judge 5: PASS parallelization (> {THRESHOLD})", "status": "pending", "activeForm": "Validating parallelization"},
{"content": "Phase 6: Define verification rubrics", "status": "pending", "activeForm": "Defining verifications"},
{"content": "Judge 6: PASS verifications (> {THRESHOLD})", "status": "pending", "activeForm": "Validating verifications"},
{"content": "Move task to todo folder", "status": "pending", "activeForm": "Promoting task"},
{"content": "Human checkpoint reviews", "status": "pending", "activeForm": "Awaiting human review"}
]
}Note: Filter todos based on configuration:
SKIP_JUDGES is true, omit ALL Judge todos (Judge 2a, 2b, 2c, 3, 4, 5, 6)research not in ACTIVE_STAGES, omit Phase 2a and Judge 2a todoscodebase analysis not in ACTIVE_STAGES, omit Phase 2b and Judge 2b todosbusiness analysis not in ACTIVE_STAGES, omit Phase 2c and Judge 2c todosarchitecture synthesis not in ACTIVE_STAGES, omit Phase 3 and Judge 3 todosdecomposition not in ACTIVE_STAGES, omit Phase 4 and Judge 4 todosparallelize not in ACTIVE_STAGES, omit Phase 5 and Judge 5 todosverifications not in ACTIVE_STAGES, omit Phase 6 and Judge 6 todosHUMAN_IN_THE_LOOP_PHASES is empty, omit human checkpoint todoEnsure directories exist:
Run the folder creation script to create task directories and configure gitignore:
bash ${CLAUDE_PLUGIN_ROOT}/scripts/create-folders.shThis creates:
.specs/tasks/draft/ - New tasks awaiting analysis.specs/tasks/todo/ - Tasks ready to implement.specs/tasks/in-progress/ - Currently being worked on.specs/tasks/done/ - Completed tasks.specs/scratchpad/ - Temporary working files (gitignored).specs/analysis/ - Codebase impact analysis files.claude/skills/ - Reusable skill documentsUpdate each todo to in_progress when starting a phase and completed when judge passes.
THRESHOLD (default 3.5) for all judge pass/fail decisions, not hardcoded values!MAX_ITERATIONS (default 3) for retry limits, not hardcoded values!MAX_ITERATIONS reached: PROCEED to next stage automatically - do NOT ask user unless phase is in HUMAN_IN_THE_LOOP_PHASES!ACTIVE_STAGES entirely - do not launch agents for excluded stages!HUMAN_IN_THE_LOOP_PHASES!SKIP_JUDGES is true: Skip ALL judge validation - proceed directly to next phase after each implementation phase completes!.specs/tasks/draft/ before running this command (unless --refine mode)!REFINE_MODE is true: Detect changes via git diff, skip unchanged stages, pass user feedback to agents!Relaunch judge till you get valid results, of following happens:
You MUST launch for each step a separate agent, instead of performing all steps yourself.
CRITICAL: For each agent you MUST:
${CLAUDE_PLUGIN_ROOT} so agents can resolve paths like @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.shNote: Phases not in ACTIVE_STAGES are skipped. If SKIP_JUDGES is true, all judge steps are skipped entirely. Human checkpoints (🔍) occur after phases in
HUMAN_IN_THE_LOOP_PHASES.
Input: Draft Task File (.specs/tasks/draft/*.md)
│
▼
Phase 2: Parallel Analysis
│
├─────────────────────┬─────────────────────┐
▼ ▼ ▼
Phase 2a: Phase 2b: Phase 2c:
Research Codebase Analysis Business Analysis
[sdd:researcher sonnet] [sdd:code-explorer sonnet] [sdd:business-analyst opus]
Judge 2a Judge 2b Judge 2c
(pass: >THRESHOLD) (pass: >THRESHOLD) (pass: >THRESHOLD)
│ │ │
└─────────────────────┴─────────────────────┘
│
▼
Phase 3: Architecture Synthesis
[sdd:software-architect opus]
Judge 3 (pass: >THRESHOLD)
│
▼
Phase 4: Decomposition
[sdd:tech-lead opus]
Judge 4 (pass: >THRESHOLD)
│
▼
Phase 5: Parallelize
[sdd:team-lead opus]
Judge 5 (pass: >THRESHOLD)
│
▼
Phase 6: Verifications
[sdd:qa-engineer opus]
Judge 6 (pass: >THRESHOLD)
│
▼
Move task: draft/ → todo/
│
▼
CompletePhase 2 launches three analysis phases in parallel, each with its own judge validation.
Launch these three phases in parallel immediately:
Model: sonnet
Agent: sdd:researcher
Depends on: Task file exists
Purpose: Gather relevant resources, documentation, libraries, and prior art. Creates or updates a reusable skill.
Launch agent:
Description: "Research task resources and create/update skill"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
Task Title: <title from task file>
CRITICAL: DO NOT OUTPUT YOUR RESEARCH, ONLY CREATE THE SCRATCHPAD AND SKILL FILE.Capture:
.claude/skills/<skill-name>/SKILL.md).specs/scratchpad/<hex-id>.md)CRITICAL: If expected files not created, launch the agent again with the same prompt.
Model: sonnet
Agent: sdd:code-explorer
Depends on: Task file exists
Purpose: Identify affected files, interfaces, and integration points
Launch agent:
Description: "Analyze codebase impact"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
Task Title: <title from task file>
CRITICAL: DO NOT OUTPUT YOUR ANALYSIS, ONLY CREATE THE SCRATCHPAD AND ANALYSIS FILE.Capture:
.specs/analysis/analysis-{name}.md).specs/scratchpad/<hex-id>.md)CRITICAL: If expected files not created, launch the agent again with the same prompt.
Model: opus
Agent: sdd:business-analyst
Depends on: Task file exists
Purpose: Refine description and create acceptance criteria
Launch agent:
Description: "Business analysis"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read ${CLAUDE_PLUGIN_ROOT}/skills/plan-task/analyse-business-requirements.md and execute it exactly as is!
Task File: <TASK_FILE>
Task Title: <title from task file>
CRITICAL: DO NOT OUTPUT YOUR BUSINESS ANALYSIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.Capture:
.specs/scratchpad/<hex-id>.md)After each parallel phase completes, launch its respective judge with the same agent type and model.
Model: sonnet
Agent: sdd:researcher
Depends on: Phase 2a completion
Purpose: Validate skill completeness and relevance
Launch judge:
Description: "Judge skill quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to skill file from Phase 2a}
### Context
This is a skill document for task: {task title}. Evaluate comprehensiveness and reusability.
### Rubric
1. Resource Coverage (weight: 0.30)
- Documentation and references gathered?
- Libraries and tools identified with recommendations?
- 1=Missing critical resources, 2=Basic coverage, 3=Adequate, 4=Comprehensive, 5=Excellent
2. Pattern Relevance (weight: 0.25)
- Are identified patterns applicable?
- Are recommendations actionable?
- 1=Irrelevant, 2=Somewhat useful, 3=Adequate, 4=Well-targeted, 5=Perfect fit
3. Issue Anticipation (weight: 0.20)
- Common pitfalls identified with solutions?
- 1=None identified, 2=Few issues, 3=Adequate, 4=Good coverage, 5=Comprehensive
4. Reusability (weight: 0.15)
- Is the skill general enough to help multiple tasks?
- Does it avoid task-specific details?
- 1=Too specific, 2=Limited reuse, 3=Adequate, 4=Good, 5=Highly reusable
5. Task Integration (weight: 0.10)
- Was task file updated with skill reference?
- 1=Not updated, 3=Updated, 5=Updated with clear instructionsCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Research complete, proceedTHRESHOLD): Re-launch Phase 2a with feedbackModel: sonnet
Agent: sdd:code-explorer
Depends on: Phase 2b completion
Purpose: Validate file identification accuracy and integration mapping
Launch judge:
Description: "Judge codebase analysis quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to analysis file from Phase 2b}
### Context
This is codebase impact analysis for task: {task title}. Evaluate accuracy and completeness.
### Rubric
1. File Identification Accuracy (weight: 0.35)
- All affected files identified with specific paths?
- New files and modifications distinguished?
- 1=Major files missing, 2=Mostly correct, 3=Adequate, 4=Precise, 5=Complete
2. Interface Documentation (weight: 0.25)
- Key functions/classes documented with signatures?
- Change requirements clear?
- 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Complete
3. Integration Point Mapping (weight: 0.25)
- Integration points identified with impact?
- Similar patterns in codebase found?
- 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Comprehensive
4. Risk Assessment (weight: 0.15)
- High risk areas identified with mitigations?
- 1=No assessment, 2=Basic, 3=Adequate, 4=Good, 5=ThoroughCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Analysis complete, proceedTHRESHOLD): Re-launch Phase 2b with feedbackModel: opus
Agent: sdd:business-analyst
Depends on: Phase 2c completion
Purpose: Validate acceptance criteria quality and scope definition
Launch judge:
Description: "Judge business analysis quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to task file from Phase 2c}
### Context
This is business analysis output. Evaluate description clarity and acceptance criteria quality.
### Rubric
1. Description Clarity (weight: 0.30)
- What/Why clearly explained?
- Scope boundaries defined?
- 1=Vague, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent
2. Acceptance Criteria Quality (weight: 0.35)
- Criteria specific and testable?
- Given/When/Then format for complex criteria?
- 1=Missing/vague, 2=Basic, 3=Adequate, 4=Good, 5=Excellent
3. Scenario Coverage (weight: 0.20)
- Primary flow documented?
- Error scenarios considered?
- 1=Missing, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive
4. Scope Definition (weight: 0.15)
- In-scope/out-of-scope explicit?
- No implementation details in description?
- 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=ClearCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Business analysis complete, proceedTHRESHOLD): Re-launch Phase 2c with feedbackWait for ALL three parallel phases (2a, 2b, 2c) AND their judges to PASS before proceeding to Phase 3.
Model: opus
Agent: sdd:software-architect
Depends on: Phase 2a + Judge 2a PASS, Phase 2b + Judge 2b PASS, Phase 2c + Judge 2c PASS
Purpose: Synthesize research, analysis, and business requirements into architectural overview
Launch agent:
Description: "Architecture synthesis"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
Skill File: <skill file path from Phase 2a>
Analysis File: <analysis file path from Phase 2b>
CRITICAL: DO NOT OUTPUT YOUR ARCHITECTURE SYNTHESIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.Capture:
.specs/scratchpad/<hex-id>.md)Model: opus
Agent: sdd:software-architect
Depends on: Phase 3 completion
Purpose: Validate architectural coherence and completeness
Launch judge:
Description: "Judge architecture synthesis quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to task file after Phase 3}
### Context
This is architecture synthesis output. The Architecture Overview section should contain
solution strategy, key decisions, and only relevant architectural sections.
### Rubric
1. Solution Strategy Clarity (weight: 0.30)
- Approach clearly explained?
- Key decisions documented with reasoning?
- Trade-offs stated?
- 1=Missing/unclear, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent
2. Reference Integration (weight: 0.20)
- Links to research and analysis files?
- Insights from both integrated?
- 1=No links, 2=Partial, 3=Adequate, 4=Good, 5=Fully integrated
3. Section Relevance (weight: 0.25)
- Only relevant sections included (not all)?
- Sections appropriate for task complexity?
- 1=Wrong sections, 2=Mostly appropriate, 3=Adequate, 4=Good, 5=Precisely targeted
4. Expected Changes Accuracy (weight: 0.25)
- Files to create/modify listed?
- Consistent with codebase analysis?
- 1=Missing/inconsistent, 2=Partial, 3=Adequate, 4=Good, 5=CompleteCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Architecture synthesis complete, proceedTHRESHOLD): Re-launch Phase 3 with feedbackWait for PASS before Phase 4.
Model: opus
Agent: sdd:tech-lead
Depends on: Phase 3 + Judge 3 PASS
Purpose: Break architecture into implementation steps with success criteria and risks
Launch agent:
Description: "Decompose into implementation steps"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
CRITICAL: DO NOT OUTPUT YOUR DECOMPOSITION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.Capture:
.specs/scratchpad/<hex-id>.md)Model: opus
Agent: sdd:tech-lead
Depends on: Phase 4 completion
Purpose: Validate implementation steps quality and completeness
Launch judge:
Description: "Judge decomposition quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to task file after Phase 4}
### Context
This is decomposition output. The Implementation Process section should contain
ordered steps with success criteria, subtasks, blockers, and risks.
### Rubric
1. Step Quality (weight: 0.30)
- Each step has clear goal, output, success criteria?
- Steps ordered by dependency?
- No step too large (>Large estimate)?
- 1=Vague/missing, 2=Basic, 3=Adequate, 4=Good, 5=Excellent
2. Success Criteria Testability (weight: 0.25)
- Criteria specific and verifiable?
- Use actual file paths, function names?
- Subtasks clearly defined with actionable descriptions?
- 1=Vague, 2=Partially testable, 3=Adequate, 4=Good, 5=All testable
3. Risk Coverage (weight: 0.25)
- Blockers identified with resolutions?
- Risks identified with mitigations?
- High-risk tasks identified with decomposition recommendations?
- 1=None, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive
4. Completeness (weight: 0.20)
- All architecture components have corresponding steps?
- Implementation summary table present?
- Definition of Done included?
- Phases organized: Setup → Foundational → User Stories → Polish?
- 1=Incomplete, 2=Partial, 3=Adequate, 4=Good, 5=CompleteCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Decomposition complete, proceed to Phase 5THRESHOLD): Re-launch Phase 4 with feedbackWait for PASS before Phase 5.
Model: opus
Agent: sdd:team-lead
Depends on: Phase 4 + Judge 4 PASS
Purpose: Reorganize implementation steps for maximum parallel execution
Launch agent:
Description: "Parallelize implementation steps"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
Use agents only from this list: {list ALL available agents with plugin prefix if available, e.g. sdd:developer, review:bug-hunter. Also include general agents: opus, sonnet, haiku}
CRITICAL: DO NOT OUTPUT YOUR PARALLELIZATION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.Capture:
.specs/scratchpad/<hex-id>.md)Model: opus
Agent: sdd:team-lead
Depends on: Phase 5 completion
Purpose: Validate dependency accuracy and parallelization optimization
Launch judge:
Description: "Judge parallelization quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to parallelized task file from Phase 5}
### Context
This is the output of Phase 5: Parallelize Steps. The artifact should contain implementation steps
reorganized for maximum parallel execution with explicit dependencies, agent assignments, and
parallelization diagram.
Use agents only from this list: {list ALL available agents with plugin prefix if available, e.g. sdd:developer, review:bug-hunter. Also include general agents: opus, sonnet, haiku}
### Rubric
1. Dependency Accuracy (weight: 0.35)
- Are step dependencies correctly identified?
- No false dependencies (steps marked dependent when they're not)?
- No missing dependencies (steps that actually depend on others)?
- 1=Major dependency errors, 2=Mostly correct, 3=Acceptable, 5=Precise dependencies
2. Parallelization Maximized (weight: 0.30)
- Are parallelizable steps correctly marked with "Parallel with:"?
- Is the parallelization diagram logical?
- 1=No parallelization/wrong, 2=Some optimization, 3=Acceptable, 5=Maximum parallelization
3. Agent Selection Correctness (weight: 0.20)
- Are agent types appropriate for outputs (opus by default, haiku for trivial, sonnet for simple but high in volume)?
- Does selection follow the Agent Selection Guide?
- Are only agents from the provided available agents list used?
- 1=Wrong agents, 2=Mostly appropriate, 3=Acceptable, 4=Optimal selection, 5=Perfect selection
4. Execution Directive Present (weight: 0.15)
- Is the sub-agent execution directive present?
- Are "MUST" requirements for parallel execution clear?
- 1=Missing directive, 2=Partial, 3=Acceptable, 4=Complete directive, 5=Perfect directiveCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Proceed to Phase 6THRESHOLD): Re-launch Phase 5 with feedbackWait for PASS before Phase 6.
Model: opus
Agent: sdd:qa-engineer
Depends on: Phase 5 + Judge 5 PASS
Purpose: Add LLM-as-Judge verification sections with rubrics
Launch agent:
Description: "Define verification rubrics"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Task File: <TASK_FILE>
CRITICAL: DO NOT OUTPUT YOUR VERIFICATIONS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.Capture:
.specs/scratchpad/<hex-id>.md)Model: opus
Agent: sdd:qa-engineer
Depends on: Phase 6 completion
Purpose: Validate verification rubrics and thresholds
Launch judge:
Description: "Judge verification quality"
Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
### Artifact Path
{path to task file with verifications from Phase 6}
### Context
This is the output of Phase 6: Define Verifications. The artifact should contain LLM-as-Judge
verification sections for each implementation step, including verification levels, custom rubrics,
thresholds, and a verification summary table.
### Rubric
1. Verification Level Appropriateness (weight: 0.30)
- Do verification levels match artifact criticality?
- HIGH criticality → Panel, MEDIUM → Single/Per-Item, LOW/NONE → None?
- 1=Mismatched levels, 2=Mostly appropriate, 3=Acceptable, 5=Precisely calibrated
2. Rubric Quality (weight: 0.30)
- Are criteria specific to the artifact type (not generic)?
- Do weights sum to 1.0?
- Are descriptions clear and measurable?
- 1=Generic/broken rubrics, 2=Adequate, 3=Acceptable, 5=Excellent custom rubrics
3. Threshold Appropriateness (weight: 0.20)
- Are thresholds reasonable (typically 4.0/5.0)?
- Higher for critical, lower for experimental?
- 1=Wrong thresholds, 2=Standard applied, 3=Acceptable, 5=Context-appropriate
4. Coverage Completeness (weight: 0.20)
- Does every step have a Verification section?
- Is the Verification Summary table present?
- 1=Missing verifications, 2=Most covered, 3=Acceptable, 5=100% coverageCRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
THRESHOLD): Workflow complete, promote taskTHRESHOLD): Re-launch Phase 6 with feedbackPurpose: Move the refined task from draft to todo folder
After all phases complete:
Move task file from draft to todo:
git mv <TASK_FILE> .specs/tasks/todo/
# Fallback if git not available: mv <TASK_FILE> .specs/tasks/todo/Update any references in research and analysis files if needed
After all executed phases and judges complete:
### Task Refined
| Property | Value |
|----------|-------|
| **Original File** | `<original TASK_FILE path>` |
| **Final Location** | `.specs/tasks/todo/<filename>` (ready for implementation) |
| **Title** | `<task title>` |
| **Type** | `<feature/bug/refactor/test/docs/chore/ci>` (from filename) |
| **Skill** | `<skill file path or "Skipped">` |
| **Skill Action** | `<Created new / Updated existing / Skipped>` |
| **Analysis** | `<analysis file path or "Skipped">` |
| **Scratchpad** | `<scratchpad file path>` |
| **Implementation Steps** | `<count or "N/A">` |
| **Parallelization Depth** | `<max parallel agents or "N/A">` |
| **Total Verifications** | `<count or "N/A">` |
### Configuration Used
| Setting | Value |
|---------|-------|
| **Target Quality** | {THRESHOLD}/5.0 |
| **Max Iterations** | {MAX_ITERATIONS} |
| **Active Stages** | {ACTIVE_STAGES as comma-separated list} |
| **Skipped Stages** | {SKIP_STAGES or stages not in ACTIVE_STAGES} |
| **Human Checkpoints** | Phase {HUMAN_IN_THE_LOOP_PHASES as comma-separated} |
| **Skip Judges** | {SKIP_JUDGES} |
| **Refine Mode** | {REFINE_MODE} |
### Quality Gates Summary
| Phase | Judge Score | Verdict |
|-------|-------------|---------|
| Phase 2a: Research | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 2b: Codebase Analysis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 2c: Business Analysis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 3: Architecture Synthesis | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 4: Decomposition | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 5: Parallelize | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
| Phase 6: Verify | X.X/5.0 | ✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED |
**Threshold Used:** {THRESHOLD}/5.0 (or N/A if SKIP_JUDGES)
**Legend:**
- ✅ PASS - Score >= THRESHOLD
- ⚠️ PROCEEDED (max iter) - Score < THRESHOLD but MAX_ITERATIONS reached, proceeded anyway
- ⏭️ SKIPPED - Stage not in ACTIVE_STAGES
### Artifacts Generated.claude/ └── skills/ └── <skill-name>/ └── SKILL.md # Reusable skill document (if research stage ran)
.specs/ ├── tasks/ │ ├── draft/ # Draft tasks (source - now empty for this task) │ ├── todo/ │ │ └── <name>.<type>.md # Complete task specification (ready for implementation) │ ├── in-progress/ # Tasks being implemented (empty) │ └── done/ # Completed tasks (empty) ├── analysis/ │ └── analysis-<name>.md # Codebase impact analysis (if codebase analysis stage ran) └── scratchpad/ └── <hex-id>.md # Architecture thinking scratchpad
### Task Status Management
Task status is managed by folder location:
- `draft/` - Tasks created but not yet refined
- `todo/` - Tasks ready for implementation
- `in-progress/` - Tasks currently being worked on
- `done/` - Completed tasks
### Next Steps
1. Review task: `.specs/tasks/todo/<filename>`
- Edit the task file directly to make corrections
- Add `//` comments to lines that need clarification or changes
- Run `/plan` again with `--refine` to incorporate your feedback — it detects changes against git and propagates updates **top-to-bottom** (editing a section only affects sections below it, not above)
2. If everything is fine, begin implementation: `/implement` (will auto-select the task from todo/)If any phase agent fails unexpectedly:
If any judge returns FAIL (score < THRESHOLD):
HUMAN_IN_THE_LOOP_PHASES, trigger human checkpoint before the next judge retry (after implementation retry but before re-judging)MAX_ITERATIONS reached: Proceed to next stage automatically (do NOT ask user unless --human-in-the-loop includes this phase)⚠️ Phase X did not pass quality threshold (X.X/THRESHOLD) after MAX_ITERATIONS iterationsImplementation → Judge FAIL → Implementation Retry → Judge Retry
↓
PASS → Continue to next stage
FAIL → Repeat until MAX_ITERATIONS
↓
MAX_ITERATIONS reached → Proceed to next stage (with warning)When phase is in HUMAN_IN_THE_LOOP_PHASES:
Implementation → Judge FAIL → Implementation Retry
↓
🔍 Human Checkpoint (optional feedback)
↓
Judge Retry
↓
PASS → Continue | FAIL → Repeat until MAX_ITERATIONS
↓
MAX_ITERATIONS → 🔍 Final Human Checkpoint
↓
User confirms → Proceed to next stagededca19
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.