CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x
Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests whether the agent produces a correct greenfield design document for a coding agent harness, covering architecture selection, loop design, tool constraints, error handling, evaluation separation, and output completeness according to the skill's prescribed structure and benchmarks.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Architecture level stated",
      "description": "design.md explicitly states an architecture level (e.g. Level 0-4 or equivalent single/multi-agent decision) with a rationale",
      "max_score": 8
    },
    {
      "name": "Separate evaluator",
      "description": "design.md specifies that evaluation is architecturally separated from generation (a distinct evaluator component, not the generator self-evaluating)",
      "max_score": 10
    },
    {
      "name": "Loop pattern matched to duration",
      "description": "design.md selects a loop pattern appropriate for a minutes-to-hours quality-critical coding task (Generator-Critic or Build-Verify-Fix, not just ReAct)",
      "max_score": 8
    },
    {
      "name": "Generator-Critic iteration cap",
      "description": "design.md specifies an iteration cap of 3 (or 2-3) for the generator-critic or refinement loop — not an uncapped loop",
      "max_score": 8
    },
    {
      "name": "Tool count within budget",
      "description": "design.md specifies a tool count that falls within the range of 8-12 tools per agent (or explicitly states tools must stay under 20% of context budget)",
      "max_score": 8
    },
    {
      "name": "Content-based file editing",
      "description": "design.md specifies file editing using content-based or string-replacement strategy (str_replace or equivalent) — NOT line-number-based patches",
      "max_score": 8
    },
    {
      "name": "Absolute filepaths required",
      "description": "design.md specifies that agents must use absolute filepaths (not relative paths) for file operations",
      "max_score": 8
    },
    {
      "name": "Architectural loop detection",
      "description": "design.md includes a loop detection or LoopGuard mechanism implemented architecturally (not relying solely on the agent's own judgment)",
      "max_score": 10
    },
    {
      "name": "Layered termination mechanisms",
      "description": "design.md specifies at least two termination mechanisms beyond the task completing (e.g. hard iteration cap AND wall-clock timeout, or objective verification)",
      "max_score": 8
    },
    {
      "name": "Context management strategy",
      "description": "design.md specifies a context management strategy matched to the expected task duration (e.g. FIC at phase boundaries for 30min-3h tasks)",
      "max_score": 8
    },
    {
      "name": "Output sections complete",
      "description": "design.md contains at least 8 of these sections: Requirements Summary, Architecture Decision, Loop Design, Action Space, Observation Formatting Strategy, Context Management Strategy, Evaluation Design, Error Handling & Recovery, Prompt Architecture, Decomposition Strategy, Complexity Budget / Simplification Plan, Key Metrics to Track, Open Questions",
      "max_score": 8
    },
    {
      "name": "Reasoning budget differentiated",
      "description": "design.md recommends differentiated reasoning budget (higher reasoning for planning/verification phases, lower for implementation) rather than uniform reasoning level throughout",
      "max_score": 8
    }
  ]
}

evals

scenario-1

criteria.json

task.md

README.md

SKILL.md

tile.json