CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-1/

Automated GitHub Issue Resolver — Harness Design

Problem Description

A developer tools company wants to build an agent that automatically resolves GitHub issues in open-source repositories. The product works as follows: a user connects their repository, selects an open issue, and the agent checks out the repo, analyzes the codebase, writes a code fix, verifies it against the existing test suite, and opens a pull request — all without human involvement during execution.

The typical issue takes 10-40 minutes for the agent to work through. Some complex issues (multi-file refactors or missing test coverage) can take 1-2 hours. The team has CI/CD infrastructure including test runners, linters, and type checkers. They have access to frontier models. Cost per resolved issue matters but quality is the primary concern — PRs that break tests or introduce regressions are worse than no PR at all.

The company has tried a basic single-loop prototype that produces inconsistent results: sometimes the agent declares success prematurely, sometimes it loops forever trying the same approach, and sometimes its edits corrupt files. They need a proper harness architecture to go into production.

Output Specification

Produce a comprehensive design document in Markdown saved as design.md. The document should cover every major architectural decision needed to take this system from prototype to production — from how many agents to use and what loop to run, through to how the system knows when it is genuinely done and how it recovers when things go wrong.

For each major decision in the document, include the rationale and note the conditions under which you would revisit the choice.

Also produce a brief decisions.md that lists, in a table, each key architectural decision and the single most important reason it was made.

evals

scenario-1

criteria.json

task.md

README.md

SKILL.md

tile.json