CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-2/

Observation & Context Strategy for a Long-Running Refactoring Agent

Background

A platform engineering team is building an agent that performs large-scale codebase refactoring — migrating a legacy Python 2 monorepo to Python 3, updating APIs, fixing type errors, and running a test suite after each batch of changes. Tasks routinely take 3-6 hours to complete. The agent uses Claude Sonnet as its model.

The team is hitting two interrelated problems. First, tool outputs are bloating the context window fast: test runners produce thousands of lines, search results return entire files, and the agent receives the full text of every output without any processing. By the midpoint of a task, context is saturated and quality degrades. Second, when they've tried to compact old context to recover space, the agent starts repeating mistakes it already made in earlier iterations — as if it forgot what didn't work.

The team knows they need a systematic approach to both how tool outputs are formatted before entering context and how context is managed over the multi-hour session. They want to handle the full range of output sizes cleanly, preserve the right information across compaction events, and ensure the cache costs stay reasonable as the agent runs over many turns.

Output Specification

Produce a technical specification document saved as observation-context-spec.md that covers:

  1. How tool outputs should be processed before being injected into context — including what to preserve in full and what to compress or offload
  2. How the agent's context should be managed across a multi-hour session, including when to compact and what to prioritize keeping
  3. How to keep API costs stable as the session grows longer
  4. Any special handling that depends on whether an operation succeeded or failed

The document should be detailed enough that an engineer could implement it directly. Be specific — include concrete thresholds and numerical targets where they exist.

evals

README.md

SKILL.md

tile.json