sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x

Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

Phase 10: Simplification Audit

Name: sharaf/agentic-harness-architect
Rating: 100 (1 reviews)
Author: sharaf

Every harness component encodes an assumption about model limitations. As models improve, remove components and add complexity at the frontier.

The Removal Test

Before adding any component, ask what breaks if you remove it instead. If you cannot name exactly what breaks, you do not need it yet.

Complexity Justification Matrix

Every new component must pass all four criteria:

Demonstrated failure: reproducible failure case without it
Model insufficiency: orchestration problem, not reasoning problem
Proportionality: complexity cost proportional to failure severity and frequency
Temporality: component will likely still be needed in 6 months

Ablation protocol

Remove harness components one at a time
Re-evaluate performance after each removal
Never remove multiple components simultaneously
Triggers: new model release, quarterly review, component not root cause in 30+ days

The most durable pattern

Generator-Evaluator separation survives across model generations. Sprint decomposition was removed when Opus 4.6 arrived. Context management is infrastructure, not reasoning scaffolding — the Bitter Lesson does not apply to it.

The 2AM Debuggability Test

Can one person hold the entire architecture in their head?
Are failure modes explicit (logs) or emergent (component interactions)?
Can you explain why each component exists without historical context?

If any answer is "no," simplify.