CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

SKILL.md

name:
agentic-harness-architect
description:
Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
metadata:
{"version":"0.3.6","source_domain":"agentic-coding-harness","source_sub_domains":"agent-loop-design, action-space-design, observation-formatting, context-window-management, multi-agent-orchestration, evaluation-and-feedback-loops, prompt-engineering-for-agents, error-handling-and-recovery, task-decomposition, iterative-simplification","research_date":"2026-03-27"}

Agentic Harness Architect

Purpose

Design a complete agentic coding harness from requirements, or audit an existing harness for architectural improvements. The harness — not the model — determines the quality ceiling.

Pick a mode:

  • greenfield — design a new harness from stated requirements
  • audit — analyze an existing harness and produce an improvement plan

Output a structured design document with architecture decisions, trade-offs, and justifications grounded in benchmarks across 10 sub-domains (781 sources).

Workflow

The design covers 10 architectural phases plus an output phase (11). For greenfield, walk through phases 1-11 in order. For audit, start at Phase 1 to inventory the current system, then jump to phases targeting known issues; always consult Phase 10 (simplification) before recommending additions.

Phase Index

#DecisionReference
1Requirements analysisphase-01-requirements.md
2Single vs. multi-agent + topologyphase-02-architecture-selection.md
3Loop design + termination + reasoning budgetphase-03-loop-design.md
4Action space, tool granularity, sandboxingphase-04-action-space.md
5Observation formatting + context managementphase-05-observations-and-context.md
6Evaluation architecture + rubricsphase-06-evaluation.md
7Error handling, LoopGuard, recoveryphase-07-error-handling.md
8Prompt engineeringphase-08-prompt-engineering.md
9Task decompositionphase-09-task-decomposition.md
10Simplification auditphase-10-simplification.md
11Produce design document(output format below)

Cross-cutting references

  • guardrails.md — must-not-do list spanning all phases. Read before finalizing the design.
  • decision-flowcharts.md — flowcharts for architecture sizing, loop selection, context strategy, evaluation, decomposition, error recovery, and simplification. Use when a decision feels ambiguous.
  • success-criteria.md — benchmarks to target and quality gates the design must pass.

All linked reference files are bundled with this tile under references/; they are on-demand detail, not external documentation. Load only the phase file or cross-cutting reference needed for the current decision.

Quick Decision Triage

One-liners to anchor where to start; consult phase references for the full rationale.

  • Single-agent default. Multi-agent only when task exceeds one context, has parallelizable read subtasks, or needs separated evaluation for subjective quality. Saturation threshold ~4 agents.
  • Loop pick by duration: minutes → ReAct; minutes-hours quality-critical → Generator-Critic (cap 3 iterations); hours mechanical → Ralph Loop; hours unpredictable → Magentic-One Dual-Loop; hours parallelizable → Orchestrator-Worker.
  • Tool ceiling: 8-12 tools per agent, <20% of context budget. Accuracy drops from ~95% (4 tools) to ~71% (46 tools).
  • Context management by duration: <30 min none; 30 min-3h FIC at phase boundaries (40-60% utilization target); 3h+ FIC with sub-agents or one-session-per-task.
  • Context compaction hierarchy: raw context retention → tool result clearing → observation masking → structured summarization → free-form summarization last. Preserve failed attempts and error traces through every compaction.
  • Evaluation: separate generation from evaluation unless deterministic tests cover the output. Self-evaluation bias inflates scores 4-9%.
  • Prompt architecture artifacts: when producing system-prompt.md or prompt-architecture.md, include a named Just-in-Time Steering Protocol section. State that decision-specific guidance is injected immediately before relevant tool calls or decision points, not front-loaded into the base system prompt.
  • Prompt conditional rules: include a compact Conditional Rules section with explicit if / then / otherwise branches for permission decisions, large-file edits, verification, and recovery.
  • Decompose when changes span 4+ files, duration exceeds the coherence window, or context will exceed 60-70%. Max 3 hierarchy levels, max 12 subtasks per level.
  • Before adding any component: apply the Removal Test ("what breaks if I leave it out?"). If you cannot name the failure case, don't add it.

Inputs to gather before starting

For greenfield: task profile, quality target, duration profile, cost constraints, model access, verification infrastructure, security requirements, human-in-the-loop posture.

For audit: current architecture description or codebase, known failure modes and pain points, performance metrics if available (completion rate, token usage, cost per task).

Full checklist in phase-01-requirements.md.

Output Format (Phase 11)

Greenfield mode

## Requirements Summary
## Architecture Decision: [Single-Agent | Multi-Agent Topology]
## Loop Design
## Action Space
## Observation Formatting Strategy
## Context Management Strategy
## Evaluation Design
## Error Handling & Recovery
## Prompt Architecture
## Decomposition Strategy
## Complexity Budget & Simplification Plan
## Key Metrics to Track
## Open Questions

Each section states: the decision, the rationale citing specific benchmarks, alternatives considered, and conditions under which to revisit.

Audit mode

## Current Architecture Summary
## Identified Issues (ordered by leverage)
## Improvement Sequence
## Components to Remove (Ablation Candidates)
## Components to Add
## Migration Path
## Open Questions

Each issue includes: severity, evidence, why it matters, recommended change, and expected impact with benchmarks.

README.md

SKILL.md

tile.json