Name: sharaf/agentic-harness-architect
Rating: 100 (1 reviews)
Author: sharaf

sharaf/agentic-harness-architect

Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).

100

1.23x

Quality

100%

Does it follow best practices?

Impact

100%

1.23x

Average score across 4 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent designs a system prompt architecture following the skill's prompt engineering guidelines: Right Altitude Framework (start minimal), three-tier permission structure, positive framing of restrictions, concrete code examples, specification granularity rules, just-in-time steering, evaluator prompt investment, and aspirational evaluator language.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Three-tier permission structure",
      "description": "system-prompt.md organizes agent permissions into three tiers: what the agent should always do, what it should ask before doing, and what it must never do — using these or equivalent labels",
      "max_score": 12
    },
    {
      "name": "Positive framing of restrictions",
      "description": "system-prompt.md states prohibitions as positive directives (e.g. 'use str_replace for edits' rather than 'do not use line numbers') — does NOT use purely negative phrasing for the majority of constraints",
      "max_score": 8
    },
    {
      "name": "Concrete code examples in prompt",
      "description": "system-prompt.md includes at least one concrete code example (a code block or inline snippet) to illustrate a style or behavior rule — not just prose description",
      "max_score": 10
    },
    {
      "name": "Conditional logic specified",
      "description": "system-prompt.md explicitly specifies conditional logic for at least one behavior (e.g. 'if the file is >500 lines, do X; otherwise do Y' or equivalent conditional branching)",
      "max_score": 8
    },
    {
      "name": "Error handling specified",
      "description": "system-prompt.md explicitly specifies what the agent should do when it encounters an error or unexpected situation — not left implicit",
      "max_score": 8
    },
    {
      "name": "Code style with examples",
      "description": "system-prompt.md specifies code style guidance accompanied by a concrete example (not text description alone)",
      "max_score": 8
    },
    {
      "name": "Just-in-time steering present",
      "description": "prompt-architecture.md or system-prompt.md describes injecting guidance at decision points (just-in-time / contextual steering) rather than loading all instructions upfront in the system prompt",
      "max_score": 10
    },
    {
      "name": "Evaluator prompt produced",
      "description": "evaluator-prompt.md exists and contains a complete prompt (not a placeholder or skeleton) — at minimum a full paragraph of evaluator instructions",
      "max_score": 8
    },
    {
      "name": "Evaluator prompt aspirational language",
      "description": "evaluator-prompt.md uses aspirational or quality-oriented language (e.g. 'expert reviewer', 'exceptional migration', 'highest standard') rather than purely utilitarian phrasing",
      "max_score": 10
    },
    {
      "name": "Reasoning differentiation stated",
      "description": "prompt-architecture.md mentions differentiated reasoning budget: higher reasoning for planning or verification phases and lower or standard for implementation — not a single uniform level",
      "max_score": 10
    },
    {
      "name": "Format and implementation details left implicit",
      "description": "prompt-architecture.md explicitly notes that format choices or implementation details are left to the agent's judgment — not over-specified",
      "max_score": 8
    }
  ]
}

sharaf/agentic-harness-architect

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-3/

criteria.jsonevals/scenario-3/