CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

phase-09-task-decomposition.mdreferences/

Phase 9: Task Decomposition

Decompose when

  • Changes across 4+ files
  • Duration exceeds model's coherence window (Opus 4.7 ~4 hours; weaker models 30-60 min)
  • Natural independence boundaries exist (backend + frontend + DB)
  • Context window will exceed 60-70% of capacity

Keep monolithic when

  • Completable in under 1 hour with a capable model
  • Task requires holistic understanding (design/creative work)
  • Coordination overhead would exceed task complexity

Decomposition intensity tracks model capability inversely

Model tierDecomposition level
Frontier (Opus 4.7, o3)Minimal; good specs > structural scaffolding
Strong general (Sonnet 4.5, GPT-4o)Feature-level + context resets; sprint contracts for complex builds
Mid-tier (Haiku, GPT-4o-mini)Aggressive decomposition; escalate architecture decisions to stronger models
Small/specialized (7B-13B)Maximum decomposition with atomic subtasks

Depth limits

Max 3 hierarchy levels. Max 12 subtasks per level. Deeper decomposition causes exponential error amplification (up to 17x).

Sprint contracts

20-30 testable criteria per sprint. Negotiated between generator and evaluator before implementation begins. Active interaction for evaluation (Playwright, API calls, database queries), not static review. Shrinking specs pattern: completed work is removed from specs on each iteration.

README.md

SKILL.md

tile.json