Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Reach for these when a phase decision feels ambiguous. Each chart maps a question to a recommended path; consult the corresponding phase reference for full rationale.
Is the task completable by a single agent in one context window?
├── Yes → Single agent (Level 0)
│ └── Does self-evaluation produce reliable quality signals?
│ ├── Yes (tests exist) → Single agent + deterministic eval
│ └── No (subjective) → Add separated evaluator (Level 1)
└── No → Multi-agent required
├── Are subtasks genuinely parallelizable?
│ ├── Yes → Fan-Out/Gather topology
│ └── No → Pipeline or Hierarchical
├── Is output verifiable automatically?
│ └── Yes → Cascade (single first, escalate on failure)
└── Does cost justify 15x token premium?
├── No → Decompose into sequential single-agent tasks
└── Yes → Select topology from phase-02 tableWhat is the task duration?
├── Minutes (1-15 tool calls) → ReAct
├── Minutes-hours, quality-critical → Generator-Critic (cap 3 iterations)
├── Hours, mechanical/verifiable → Ralph Loop (fresh context per iteration)
├── Hours, unpredictable → Magentic-One Dual-Loop
└── Hours, parallelizable → Orchestrator-WorkerExpected task duration?
├── < 30 min → No management needed
├── 30 min - 3 hours → FIC at phase boundaries (target 40-60%)
├── 3+ hours → FIC + sub-agents OR one-session-per-task resets
└── Automated loop → One-session-per-task with external stateIs the output objectively testable?
├── Yes
│ ├── Test suite exists → Deterministic eval (no LLM needed)
│ └── No test suite → LLM-generated tests → deterministic eval
└── No (subjective quality)
├── Consistent domain → Fixed calibrated rubric with separated evaluator
└── Variable domain → Adaptive rubric (AdaRubric)Should I decompose this task?
├── Changes < 4 files AND fits in one session? → No decomposition
├── Natural independence boundaries exist? → Feature-level decomposition
├── Features have internal complexity? → Sprint-level with contracts
├── Complex dependencies? → DAG-based task graphs
└── Optimal granularity uncertain? → ADaPT (decompose on failure only)What type of error?
├── Transient infrastructure → Exponential backoff (max 3 retries: 5s/15s/60s)
├── Logic error in generated code → Run tests, feed error → flush context if stuck
├── Doom loop detected → Revert code, flush context, restart with lessons only
├── Premature completion → QA agent validates against feature list
├── Multi-agent cascade → Circuit breaker + SagaLLM compensation
└── Context exhaustion → Checkpoint, start new session with summaryShould I keep this harness component?
├── Is it infrastructure (state, persistence, error recovery)?
│ └── Yes → Keep; re-evaluate implementation, not existence
├── Is it reasoning scaffolding?
│ ├── Has the model improved since this was added?
│ │ ├── No → Keep; schedule re-evaluation at next model release
│ │ └── Yes → Run 10 representative tasks without it
│ │ ├── Success rate holds → Remove permanently
│ │ ├── Degrades → Keep; document specific failure cases
│ │ └── Mixed → Implement conditional activation
│ └── Can you name exactly what breaks without it?
│ ├── No → Remove (Removal Test failed)
│ └── Yes → Keep if it passes Complexity Justification Matrixevals
scenario-1
scenario-2
scenario-3
scenario-4
references