CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

phase-07-error-handling.mdreferences/

Phase 7: Error Handling & Recovery

Design error handling architecturally, not through prompts alone.

Core components

1. LoopGuard

Monitor for repetitive behavior across three dimensions:

  • Action signature fingerprints (deduplicate tool+args)
  • Normalized error pattern matching (cosine similarity on stripped messages)
  • Semantic similarity via embeddings
  • Graduated intervention: inject reflection prompt → force different approach → escalate/terminate
  • Reduces iteration counts from 30+ to ~8

2. Checkpoint strategy by deployment type

DeploymentStrategy
Multi-session/long-runningStateless restore via progress files
Real-time recoveryStateful restore (LangGraph, CRIU)
Code generationShadow git checkpoint (commit before every modification)

3. Retry vs. pivot decision

  • Retry when: transient error, actionable error info, quality trending up, within budget (max 3 for infrastructure)
  • Pivot when: quality flat/declining across 3+ iterations, same semantic error repeats, token consumption escalating without quality gain
  • Escalate to human when: irreversible side effects, 2-3 failed pivots, ambiguous error, security-sensitive operations

4. Context poisoning prevention

  • Failed code accumulates in context, causing probability drift toward the failing pattern
  • On doom loop detection: revert code, flush failed-attempt context, restart with original task + lessons learned only
  • Preserve error traces during compaction (they serve as implicit negative examples)

5. Multi-agent error isolation

  • Schema validation at every agent boundary (typed schemas, not natural language)
  • Circuit breakers between agent clusters using 95th percentile response times
  • SagaLLM compensation agents for rollback in multi-step workflows

6. Premature completion prevention

  • Never trust agent self-assessment for completion
  • Maintain structured feature list with items initially marked failing
  • QA agent validates against the list before accepting completion
  • Pre-completion checklist middleware intercepts exit and forces verification

README.md

SKILL.md

tile.json