Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Design a complete agentic coding harness from requirements, or audit an existing harness for architectural improvements. The harness — not the model — determines the quality ceiling.
Pick a mode:
greenfield — design a new harness from stated requirementsaudit — analyze an existing harness and produce an improvement planOutput a structured design document with architecture decisions, trade-offs, and justifications grounded in benchmarks across 10 sub-domains (781 sources).
The design covers 10 architectural phases plus an output phase (11). For greenfield, walk through phases 1-11 in order. For audit, start at Phase 1 to inventory the current system, then jump to phases targeting known issues; always consult Phase 10 (simplification) before recommending additions.
| # | Decision | Reference |
|---|---|---|
| 1 | Requirements analysis | phase-01-requirements.md |
| 2 | Single vs. multi-agent + topology | phase-02-architecture-selection.md |
| 3 | Loop design + termination + reasoning budget | phase-03-loop-design.md |
| 4 | Action space, tool granularity, sandboxing | phase-04-action-space.md |
| 5 | Observation formatting + context management | phase-05-observations-and-context.md |
| 6 | Evaluation architecture + rubrics | phase-06-evaluation.md |
| 7 | Error handling, LoopGuard, recovery | phase-07-error-handling.md |
| 8 | Prompt engineering | phase-08-prompt-engineering.md |
| 9 | Task decomposition | phase-09-task-decomposition.md |
| 10 | Simplification audit | phase-10-simplification.md |
| 11 | Produce design document | (output format below) |
All linked reference files are bundled with this tile under references/; they are on-demand detail, not external documentation. Load only the phase file or cross-cutting reference needed for the current decision.
One-liners to anchor where to start; consult phase references for the full rationale.
system-prompt.md or prompt-architecture.md, include a named Just-in-Time Steering Protocol section. State that decision-specific guidance is injected immediately before relevant tool calls or decision points, not front-loaded into the base system prompt.Conditional Rules section with explicit if / then / otherwise branches for permission decisions, large-file edits, verification, and recovery.For greenfield: task profile, quality target, duration profile, cost constraints, model access, verification infrastructure, security requirements, human-in-the-loop posture.
For audit: current architecture description or codebase, known failure modes and pain points, performance metrics if available (completion rate, token usage, cost per task).
Full checklist in phase-01-requirements.md.
## Requirements Summary
## Architecture Decision: [Single-Agent | Multi-Agent Topology]
## Loop Design
## Action Space
## Observation Formatting Strategy
## Context Management Strategy
## Evaluation Design
## Error Handling & Recovery
## Prompt Architecture
## Decomposition Strategy
## Complexity Budget & Simplification Plan
## Key Metrics to Track
## Open QuestionsEach section states: the decision, the rationale citing specific benchmarks, alternatives considered, and conditions under which to revisit.
## Current Architecture Summary
## Identified Issues (ordered by leverage)
## Improvement Sequence
## Components to Remove (Ablation Candidates)
## Components to Add
## Migration Path
## Open QuestionsEach issue includes: severity, evidence, why it matters, recommended change, and expected impact with benchmarks.
evals
scenario-1
scenario-2
scenario-3
scenario-4
references