CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

phase-10-simplification.mdreferences/

Phase 10: Simplification Audit

Every harness component encodes an assumption about model limitations. As models improve, remove components and add complexity at the frontier.

The Removal Test

Before adding any component, ask what breaks if you remove it instead. If you cannot name exactly what breaks, you do not need it yet.

Complexity Justification Matrix

Every new component must pass all four criteria:

  1. Demonstrated failure: reproducible failure case without it
  2. Model insufficiency: orchestration problem, not reasoning problem
  3. Proportionality: complexity cost proportional to failure severity and frequency
  4. Temporality: component will likely still be needed in 6 months

Ablation protocol

  • Remove harness components one at a time
  • Re-evaluate performance after each removal
  • Never remove multiple components simultaneously
  • Triggers: new model release, quarterly review, component not root cause in 30+ days

The most durable pattern

Generator-Evaluator separation survives across model generations. Sprint decomposition was removed when Opus 4.6 arrived. Context management is infrastructure, not reasoning scaffolding — the Bitter Lesson does not apply to it.

The 2AM Debuggability Test

  • Can one person hold the entire architecture in their head?
  • Are failure modes explicit (logs) or emergent (component interactions)?
  • Can you explain why each component exists without historical context?

If any answer is "no," simplify.

README.md

SKILL.md

tile.json