CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

decision-flowcharts.mdreferences/

Decision Flowcharts

Reach for these when a phase decision feels ambiguous. Each chart maps a question to a recommended path; consult the corresponding phase reference for full rationale.

Architecture Sizing

Is the task completable by a single agent in one context window?
├── Yes → Single agent (Level 0)
│   └── Does self-evaluation produce reliable quality signals?
│       ├── Yes (tests exist) → Single agent + deterministic eval
│       └── No (subjective) → Add separated evaluator (Level 1)
└── No → Multi-agent required
    ├── Are subtasks genuinely parallelizable?
    │   ├── Yes → Fan-Out/Gather topology
    │   └── No → Pipeline or Hierarchical
    ├── Is output verifiable automatically?
    │   └── Yes → Cascade (single first, escalate on failure)
    └── Does cost justify 15x token premium?
        ├── No → Decompose into sequential single-agent tasks
        └── Yes → Select topology from phase-02 table

Loop Selection

What is the task duration?
├── Minutes (1-15 tool calls) → ReAct
├── Minutes-hours, quality-critical → Generator-Critic (cap 3 iterations)
├── Hours, mechanical/verifiable → Ralph Loop (fresh context per iteration)
├── Hours, unpredictable → Magentic-One Dual-Loop
└── Hours, parallelizable → Orchestrator-Worker

Context Strategy Selection

Expected task duration?
├── < 30 min → No management needed
├── 30 min - 3 hours → FIC at phase boundaries (target 40-60%)
├── 3+ hours → FIC + sub-agents OR one-session-per-task resets
└── Automated loop → One-session-per-task with external state

Evaluation Strategy Selection

Is the output objectively testable?
├── Yes
│   ├── Test suite exists → Deterministic eval (no LLM needed)
│   └── No test suite → LLM-generated tests → deterministic eval
└── No (subjective quality)
    ├── Consistent domain → Fixed calibrated rubric with separated evaluator
    └── Variable domain → Adaptive rubric (AdaRubric)

Decomposition Decision

Should I decompose this task?
├── Changes < 4 files AND fits in one session? → No decomposition
├── Natural independence boundaries exist? → Feature-level decomposition
├── Features have internal complexity? → Sprint-level with contracts
├── Complex dependencies? → DAG-based task graphs
└── Optimal granularity uncertain? → ADaPT (decompose on failure only)

Error Recovery Strategy

What type of error?
├── Transient infrastructure → Exponential backoff (max 3 retries: 5s/15s/60s)
├── Logic error in generated code → Run tests, feed error → flush context if stuck
├── Doom loop detected → Revert code, flush context, restart with lessons only
├── Premature completion → QA agent validates against feature list
├── Multi-agent cascade → Circuit breaker + SagaLLM compensation
└── Context exhaustion → Checkpoint, start new session with summary

Simplification Decision

Should I keep this harness component?
├── Is it infrastructure (state, persistence, error recovery)?
│   └── Yes → Keep; re-evaluate implementation, not existence
├── Is it reasoning scaffolding?
│   ├── Has the model improved since this was added?
│   │   ├── No → Keep; schedule re-evaluation at next model release
│   │   └── Yes → Run 10 representative tasks without it
│   │       ├── Success rate holds → Remove permanently
│   │       ├── Degrades → Keep; document specific failure cases
│   │       └── Mixed → Implement conditional activation
│   └── Can you name exactly what breaks without it?
│       ├── No → Remove (Removal Test failed)
│       └── Yes → Keep if it passes Complexity Justification Matrix

README.md

SKILL.md

tile.json