Design, build, or audit a coding agent, agentic loop, tool-use harness, or autonomous coding system — covering loop architecture, action space, context strategy, observation formatting, evaluation, error handling, prompt engineering, and task decomposition. Use when the user wants to design an agent, build a coding agent, scaffold an agentic system, architect a tool-use loop, review an existing agent harness for improvements, fix context bloat or compaction problems, tune observation formatting or tool output handling, debug agent loop or termination issues, design a system prompt or evaluator prompt for an agent, set up or redesign an agent evaluation pipeline, plan multi-agent orchestration, or specify how an agent should manage context, tools, prompts, evaluation, or recovery (greenfield design or audit mode).
100
100%
Does it follow best practices?
Impact
100%
1.23xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent designs a system prompt architecture following the skill's prompt engineering guidelines: Right Altitude Framework (start minimal), three-tier permission structure, positive framing of restrictions, concrete code examples, specification granularity rules, just-in-time steering, evaluator prompt investment, and aspirational evaluator language.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Three-tier permission structure",
"description": "system-prompt.md organizes agent permissions into three tiers: what the agent should always do, what it should ask before doing, and what it must never do — using these or equivalent labels",
"max_score": 12
},
{
"name": "Positive framing of restrictions",
"description": "system-prompt.md states prohibitions as positive directives (e.g. 'use str_replace for edits' rather than 'do not use line numbers') — does NOT use purely negative phrasing for the majority of constraints",
"max_score": 8
},
{
"name": "Concrete code examples in prompt",
"description": "system-prompt.md includes at least one concrete code example (a code block or inline snippet) to illustrate a style or behavior rule — not just prose description",
"max_score": 10
},
{
"name": "Conditional logic specified",
"description": "system-prompt.md explicitly specifies conditional logic for at least one behavior (e.g. 'if the file is >500 lines, do X; otherwise do Y' or equivalent conditional branching)",
"max_score": 8
},
{
"name": "Error handling specified",
"description": "system-prompt.md explicitly specifies what the agent should do when it encounters an error or unexpected situation — not left implicit",
"max_score": 8
},
{
"name": "Code style with examples",
"description": "system-prompt.md specifies code style guidance accompanied by a concrete example (not text description alone)",
"max_score": 8
},
{
"name": "Just-in-time steering present",
"description": "prompt-architecture.md or system-prompt.md describes injecting guidance at decision points (just-in-time / contextual steering) rather than loading all instructions upfront in the system prompt",
"max_score": 10
},
{
"name": "Evaluator prompt produced",
"description": "evaluator-prompt.md exists and contains a complete prompt (not a placeholder or skeleton) — at minimum a full paragraph of evaluator instructions",
"max_score": 8
},
{
"name": "Evaluator prompt aspirational language",
"description": "evaluator-prompt.md uses aspirational or quality-oriented language (e.g. 'expert reviewer', 'exceptional migration', 'highest standard') rather than purely utilitarian phrasing",
"max_score": 10
},
{
"name": "Reasoning differentiation stated",
"description": "prompt-architecture.md mentions differentiated reasoning budget: higher reasoning for planning or verification phases and lower or standard for implementation — not a single uniform level",
"max_score": 10
},
{
"name": "Format and implementation details left implicit",
"description": "prompt-architecture.md explicitly notes that format choices or implementation details are left to the agent's judgment — not over-specified",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
references