CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

success-criteria.mdreferences/

Success Criteria

The design is complete when it satisfies architecture completeness, hits target benchmarks, and passes the quality gates below.

Architecture completeness

  • Every phase (1-11) has an explicit decision with rationale
  • Architecture level (0-4) is stated and justified
  • Loop pattern is selected with termination mechanisms specified
  • Tool count is within budget (8-12 per agent context, <20% of context budget)
  • Context management strategy matches expected task duration
  • Evaluation design separates generation from evaluation (or justifies why not)
  • Error handling includes LoopGuard, checkpointing, and retry/pivot/escalate logic
  • Prompt architecture specifies what to include and what to leave implicit

Benchmarks to target

MetricTargetSource
Tool definitions< 20% of context budgetAction Space research
Context utilization40-60% (FIC target)Context Window research
Observation signal density> 50% useful tokensObservation Formatting research
Generator-Critic iterations2-3 maxEvaluation research
Agent count2-4 (saturation threshold)Multi-Agent research
Decomposition depthMax 3 levelsTask Decomposition research
Subtasks per levelMax 12Task Decomposition research
Evaluation leniency+/- 0.1 on normalized scaleEvaluation research
Error recovery rate> 70% (73.5% on failed tasks benchmark)Error Handling research

Quality gates

  • Passes the 2AM Debuggability Test (one person can hold the architecture in their head)
  • Passes the Removal Test for every component (can name exactly what breaks without it)
  • Every component passes the Complexity Justification Matrix (demonstrated failure, model insufficiency, proportionality, temporality)
  • Design document includes conditions under which to revisit each decision
  • Open questions are identified and isolated

README.md

SKILL.md

tile.json