sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x

Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

Success Criteria

Name: sharaf/agentic-harness-architect
Rating: 100 (1 reviews)
Author: sharaf

The design is complete when it satisfies architecture completeness, hits target benchmarks, and passes the quality gates below.

Architecture completeness

Every phase (1-11) has an explicit decision with rationale
Architecture level (0-4) is stated and justified
Loop pattern is selected with termination mechanisms specified
Tool count is within budget (8-12 per agent context, <20% of context budget)
Context management strategy matches expected task duration
Evaluation design separates generation from evaluation (or justifies why not)
Error handling includes LoopGuard, checkpointing, and retry/pivot/escalate logic
Prompt architecture specifies what to include and what to leave implicit

Benchmarks to target

Metric	Target	Source
Tool definitions	< 20% of context budget	Action Space research
Context utilization	40-60% (FIC target)	Context Window research
Observation signal density	> 50% useful tokens	Observation Formatting research
Generator-Critic iterations	2-3 max	Evaluation research
Agent count	2-4 (saturation threshold)	Multi-Agent research
Decomposition depth	Max 3 levels	Task Decomposition research
Subtasks per level	Max 12	Task Decomposition research
Evaluation leniency	+/- 0.1 on normalized scale	Evaluation research
Error recovery rate	> 70% (73.5% on failed tasks benchmark)	Error Handling research

Quality gates

Passes the 2AM Debuggability Test (one person can hold the architecture in their head)
Passes the Removal Test for every component (can name exactly what breaks without it)
Every component passes the Complexity Justification Matrix (demonstrated failure, model insufficiency, proportionality, temporality)
Design document includes conditions under which to revisit each decision
Open questions are identified and isolated

sharaf/agentic-harness-architect