Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
The design is complete when it satisfies architecture completeness, hits target benchmarks, and passes the quality gates below.
| Metric | Target | Source |
|---|---|---|
| Tool definitions | < 20% of context budget | Action Space research |
| Context utilization | 40-60% (FIC target) | Context Window research |
| Observation signal density | > 50% useful tokens | Observation Formatting research |
| Generator-Critic iterations | 2-3 max | Evaluation research |
| Agent count | 2-4 (saturation threshold) | Multi-Agent research |
| Decomposition depth | Max 3 levels | Task Decomposition research |
| Subtasks per level | Max 12 | Task Decomposition research |
| Evaluation leniency | +/- 0.1 on normalized scale | Evaluation research |
| Error recovery rate | > 70% (73.5% on failed tasks benchmark) | Error Handling research |
evals
scenario-1
scenario-2
scenario-3
scenario-4
references