CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

100%

39%

Automated GitHub Issue Resolver — Harness Design

Greenfield harness design document

Criteria
Without context
With context

Architecture level stated

75%

100%

Separate evaluator

100%

100%

Loop pattern matched to duration

100%

100%

Generator-Critic iteration cap

50%

100%

Tool count within budget

0%

100%

Content-based file editing

37%

100%

Absolute filepaths required

0%

100%

Architectural loop detection

90%

100%

Layered termination mechanisms

75%

100%

Context management strategy

100%

100%

Output sections complete

62%

100%

Reasoning budget differentiated

25%

100%

100%

17%

Observation & Context Strategy for a Long-Running Refactoring Agent

Observation and context strategy design

Criteria
Without context
With context

Success silent, failure verbose

100%

100%

Token-based truncation limits

50%

100%

Truncation bands by output size

100%

100%

Error output preserved verbatim

60%

100%

FIC strategy for 3h+ tasks

66%

100%

Error history preserved during compaction

100%

100%

KV-cache preservation rules

100%

100%

Compaction preference hierarchy

50%

100%

Observation pipeline steps

100%

100%

Recovery hints on truncation

100%

100%

100%

45%

System Prompt Architecture for a Code Migration Agent

System prompt architecture design

Criteria
Without context
With context

Three-tier permission structure

100%

100%

Positive framing of restrictions

50%

100%

Concrete code examples in prompt

20%

100%

Conditional logic specified

75%

100%

Error handling specified

100%

100%

Code style with examples

0%

100%

Just-in-time steering present

10%

100%

Evaluator prompt produced

100%

100%

Evaluator prompt aspirational language

10%

100%

Reasoning differentiation stated

50%

100%

Format and implementation details left implicit

100%

100%

100%

13%

Evaluation Pipeline for an Automated Code Review Agent

Evaluation pipeline architecture

Criteria
Without context
With context

Evaluation stage ordering

78%

100%

LLM evaluation stage last

100%

100%

Separate evaluator model

100%

100%

Holistic rubric presentation (CRE)

100%

100%

Calibration approach specified

100%

100%

Precision over catch rate

37%

100%

Iteration limit stated

100%

100%

Two-stage judge for developer-facing output

58%

100%

Self-evaluation bias addressed

100%

100%

Deterministic tests before LLM

100%

100%

Evaluated
Agent
Claude
Model
Claude Sonnet 4.6

Table of Contents