CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

phase-05-observations-and-context.mdreferences/

Phase 5: Observation Formatting & Context Management

Observation formatting is a direct lever on task completion rates. 30-60% of tokens sent to models add no value.

Observation formatting pipeline

  1. Raw capture: stdout, stderr, exit code, metadata
  2. Per-tool transformation: line numbers for files, exit-code-first for commands, ranked snippets for search
  3. Size assessment: measure against token budget
  4. Compression: summarize, truncate, or offload based on thresholds
  5. Metadata injection: truncation markers, recovery hints, file references
  6. Position optimization: highest-signal content at start/end (U-shaped attention curve)

The cardinal rule — "success is silent, failure is verbose"

  • Passing tests: count only
  • Failing tests: full output
  • Successful commands: confirmation line
  • Failed commands: full error with context

Truncation strategy by output size

Output sizeStrategy
< 2K tokensPass through unchanged
2K-10K tokens, high signalPer-tool summarization
2K-10K tokens, low signalHead-tail truncation
10K-25K tokensOffload to disk + preview
> 25K tokensOffload + sub-agent delegation
Error output (any size)Preserve verbatim up to 5K tokens

Token limits

Use token-based limits, not line-based. Approximate with num_bytes / 4 when no tokenizer is available. 25K tokens (Claude Code default) is a well-tested ceiling.

Context management strategy by task duration

DurationStrategy
< 30 minNo management needed
30 min - 3 hoursFIC (Frequent Intentional Compaction) at phase boundaries; target 40-60% utilization
3+ hoursFIC with sub-agents (essential) or one-session-per-task resets
Automated loopsOne-session-per-task with external state persistence
Parallel explorationSub-agent context isolation

Context quality zones

  • 0-40%: High quality
  • 40-60%: Optimal (FIC target)
  • 60-80%: Degrading — activate per-tool summarizers
  • 80-85%: Replace older tool results with ~15-token reference pointers
  • 85-90%: Remove entire older turns
  • 90%+: Emergency compaction or context reset

Compaction preference hierarchy

Best to worst quality:

  1. Raw context (keep original)
  2. Tool result clearing (API-level)
  3. Observation masking (52% cheaper than summarization, +2.6% solve rate)
  4. Structured summarization with anchored sections
  5. Free-form summarization (last resort)

When producing observation-context-spec.md, include this hierarchy by name and in this order. State the default rule as:

Keep raw context when possible; if space is needed, clear bulky tool results first, then mask low-signal observations, then use anchored structured summaries. Use free-form summarization only as a last resort.

The spec must distinguish these approaches instead of saying "compact old context" generically, and it must explicitly preserve failed attempts, error traces, and rejected strategies across every compaction step.

KV-cache preservation

Maintain stable prefixes. Append-only context modifications. Deterministic JSON serialization (sorted keys). Never remove/reorder tools mid-session (10x cost increase from cache invalidation: $0.30 vs. $3.00/MTok).

README.md

SKILL.md

tile.json