Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
Every harness component encodes an assumption about model limitations. As models improve, remove components and add complexity at the frontier.
Before adding any component, ask what breaks if you remove it instead. If you cannot name exactly what breaks, you do not need it yet.
Every new component must pass all four criteria:
Generator-Evaluator separation survives across model generations. Sprint decomposition was removed when Opus 4.6 arrived. Context management is infrastructure, not reasoning scaffolding — the Bitter Lesson does not apply to it.
If any answer is "no," simplify.
evals
scenario-1
scenario-2
scenario-3
scenario-4
references