Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Select the execution loop pattern based on task profile and duration. Layer multiple termination mechanisms.
| Pattern | Best for | Duration | Cost | Key mechanism |
|---|---|---|---|---|
| ReAct | Simple tasks, 1-15 tool calls | Minutes | Low | Thought-Action-Observation cycle |
| Generator-Critic | Quality-critical output | Minutes-hours | Medium | Separate evaluator with rubrics; cap at 3 iterations |
| Ralph Loop | Mechanical/verifiable tasks (refactoring, migration) | Hours | $5-150 | Outer verification wrapper; fresh context per iteration |
| Magentic-One Dual-Loop | Complex/unpredictable multi-tool tasks | Hours | Medium-High | Task Ledger + Progress Ledger; strategic replanning |
| Orchestrator-Worker | Parallelizable subtasks | Hours | High (15x) | Fan-out with isolated contexts; Opus orchestrator, Sonnet workers |
| Build-Verify-Fix | Any coding task with test infrastructure | Minutes-hours | Medium | Planning-Build-Verify-Fix phases with middleware interception |
xhigh reasoning for planning and verification phaseshigh reasoning during implementationxhigh causes timeouts and lower scores (53.9% vs. 66.5%)evals
scenario-1
scenario-2
scenario-3
scenario-4
references