Write or audit AI agent system prompts component-by-component across identity, instruction architecture, behavioral constraints, tools, examples, context strategy, output format, and error handling. Use when the user wants to design a new agent prompt, write a system prompt, review an existing agent prompt, fix tool-use instructions, audit prompt structure, improve context strategy, tune output formats, or define error handling for single-agent or multi-agent systems.
100
100%
Does it follow best practices?
Impact
100%
1.33xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether write mode produces a complete agent system prompt with required first actions, design decisions, scoped persona, tool descriptions, context strategy, output contract, error handling, escalation, and iteration guidance.",
"type": "weighted_checklist",
"checklist": [
{
"name": "First Actions section",
"description": "The response starts with or clearly includes a filled '## First Actions' section that states write mode and summarizes checked inputs before writing the prompt.",
"max_score": 8
},
{
"name": "Design Decisions section",
"description": "The response includes design notes or a '## Design Decisions' section explaining key choices such as persona depth, Claude-oriented structure, tool handling, constraint framing, context handling, and output format.",
"max_score": 8
},
{
"name": "System Prompt section",
"description": "The response includes a clearly separated '## System Prompt' section containing the generated prompt text, not only advice about how to write one.",
"max_score": 10
},
{
"name": "Scope boundaries",
"description": "The generated prompt defines what the agent handles and what is out of scope, including escalation or refusal behavior for non-billing requests.",
"max_score": 8
},
{
"name": "Tool definitions",
"description": "Each of the four tools is described with when to use it, what its parameters mean, and a caveat or result-handling instruction.",
"max_score": 10
},
{
"name": "Escalation triggers",
"description": "The generated prompt lists concrete escalation triggers including high-dollar refunds, legal or chargeback threats, security incidents, and repeated failed attempts.",
"max_score": 8
},
{
"name": "Tool failure handling",
"description": "The generated prompt instructs the agent not to treat empty or failed tool results as definitive proof of absence, and gives a recovery or escalation path.",
"max_score": 8
},
{
"name": "Context strategy",
"description": "The response addresses multi-turn context handling, including summary or handoff context and keeping static prompt content separate from dynamic conversation/tool data.",
"max_score": 8
},
{
"name": "Output format",
"description": "The generated prompt specifies the customer-facing response format for normal answers and escalations, using Markdown or another clear structure.",
"max_score": 8
},
{
"name": "Critical rule sandwich",
"description": "Critical safety, scope, or escalation rules appear near both the beginning and the end of the generated prompt or are explicitly repeated as final instructions.",
"max_score": 8
},
{
"name": "How to Iterate",
"description": "The response includes a '## How to Iterate' section or equivalent with evaluation guidance such as golden cases, deterministic graders, LLM judges, one-variable changes, and stop criteria.",
"max_score": 8
},
{
"name": "Positive framing",
"description": "Non-safety constraints are primarily framed as actions to take rather than only as negative prohibitions; absolute negative language is reserved for high-risk restrictions.",
"max_score": 6
}
]
}