Name: sharaf/agentic-harness-architect
Rating: 100 (1 reviews)
Author: sharaf

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x

Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent produces a correct observation formatting and context management specification for a long-running coding agent, applying the skill's specific rules for truncation strategy by size band, token-based limits, error output preservation, success-silent/failure-verbose, FIC strategy for 3h+ tasks, error history preservation, KV-cache preservation, compaction preference hierarchy, and observation pipeline structure.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Success silent, failure verbose",
      "description": "observation-context-spec.md specifies that passing tests produce only a count (not full output), while failing tests include full output — applying different verbosity rules based on success/failure",
      "max_score": 12
    },
    {
      "name": "Token-based truncation limits",
      "description": "observation-context-spec.md specifies token-based limits for truncation (not line-based), and includes an approximation method (e.g. num_bytes / 4) or references token counts directly",
      "max_score": 10
    },
    {
      "name": "Truncation bands by output size",
      "description": "observation-context-spec.md defines different handling strategies for at least 3 distinct output size bands (e.g. small pass-through, medium summarize, large offload to disk)",
      "max_score": 10
    },
    {
      "name": "Error output preserved verbatim",
      "description": "observation-context-spec.md specifies that error output is preserved verbatim regardless of size (up to a stated limit, e.g. ~5K tokens), rather than being truncated like other output",
      "max_score": 10
    },
    {
      "name": "FIC strategy for 3h+ tasks",
      "description": "observation-context-spec.md specifies context compaction at phase boundaries (Frequent Intentional Compaction) as the strategy for 3h+ sessions, with a target utilization range (e.g. 40-60%)",
      "max_score": 12
    },
    {
      "name": "Error history preserved during compaction",
      "description": "observation-context-spec.md explicitly states that failed attempts and error traces must NOT be removed during compaction or summarization",
      "max_score": 10
    },
    {
      "name": "KV-cache preservation rules",
      "description": "observation-context-spec.md includes at least one KV-cache preservation rule: stable prefixes, append-only modifications, no tool reordering mid-session, or avoidance of timestamps in system prompts",
      "max_score": 10
    },
    {
      "name": "Compaction preference hierarchy",
      "description": "observation-context-spec.md ranks compaction approaches in quality order, distinguishing between raw context retention, tool result clearing, observation masking, and summarization — with summarization recommended only as a last resort",
      "max_score": 8
    },
    {
      "name": "Observation pipeline steps",
      "description": "observation-context-spec.md describes a multi-step processing pipeline for tool outputs (e.g. capture → transform → size check → compress → inject metadata → position) rather than treating all output the same way",
      "max_score": 8
    },
    {
      "name": "Recovery hints on truncation",
      "description": "observation-context-spec.md specifies that when output is truncated or offloaded, the agent receives metadata indicating what was removed and how to retrieve it (recovery hints)",
      "max_score": 10
    }
  ]
}

sharaf/agentic-harness-architect

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/