sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x

Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

Phase 4: Action Space Design

Name: sharaf/agentic-harness-architect
Rating: 100 (1 reviews)
Author: sharaf

Design the tool set the agent will use. Tool granularity should match model capability.

Granularity by model tier

Model tier	Strategy	Example
Frontier (Opus 4+, GPT-5)	Coarse — bash + file editor	Claude Code minimal scaffold
Strong general (Sonnet 4, GPT-4o)	Medium — 5-10 curated tools	SWE-agent ACI
Mid-tier (Haiku, GPT-4o-mini)	Fine-grained with guardrails	Structured tools with validation
Small/open-source (<70B)	Maximum structure, atomic operations	Strictly typed, no CodeAct

Tool count ceiling

8-12 tools per agent context. Accuracy drops from ~95% (4 tools) to ~71% (46 tools). Tool definitions must stay under 20% of context budget. Use dynamic tool loading or sub-agent architectures for broader needs.

File editing strategy

Condition	Strategy
File < 200 lines	str_replace with exact unique match
File > 200 lines, change < 20 lines	str_replace with expanded context
Change > 20 lines, AST available	Semantic editing via FQDN/AST node
Change > 20 lines, no AST	Whole-file replacement with syntax/lint validation

Poka-yoke (error-proofing) principles

Require absolute filepaths (eliminates working-directory errors)
Use content-based addressing, not line numbers (line numbers break between read and edit)
Validate parameters with schemas before execution
Return actionable errors with expected format, constraints, and correction examples

Sandboxing strategy

Context	Strategy	Trade-off
Trusted code, internal team	Hardened Docker + seccomp + AppArmor	Low overhead
Untrusted code, single-tenant	gVisor containers	10-30% I/O overhead, strong isolation
Multi-tenant, untrusted	Firecracker microVMs	~125ms boot, <5 MiB/VM, hardware isolation
Regulated, zero-trust	Kata Containers on Kubernetes	~200ms boot

Read-write separation

Separate observation tools (read-only) from mutation tools (write). Enable observation-only mode, graduated permission escalation, and audit trails (ask/allow/deny model).