Write or audit AI agent system prompts component-by-component across identity, instruction architecture, behavioral constraints, tools, examples, context strategy, output format, and error handling. Use when the user wants to design a new agent prompt, write a system prompt, review an existing agent prompt, fix tool-use instructions, audit prompt structure, improve context strategy, tune output formats, or define error handling for single-agent or multi-agent systems.
100
100%
Does it follow best practices?
Impact
100%
1.33xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether audit mode produces a structured prompt audit with first actions, component scores, prioritized findings, specific evidence, concrete rewrites, and open questions.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Audit title and First Actions",
"description": "The response is clearly an agent prompt audit and includes a filled '## First Actions' section that states audit mode and identifies the existing prompt as the required artifact.",
"max_score": 10
},
{
"name": "Component Scores",
"description": "The response includes a '## Component Scores' section or equivalent that scores each applicable component as pass, issue, or n/a.",
"max_score": 10
},
{
"name": "Highest-impact fixes",
"description": "The response includes a prioritized list of the top fixes, ordered by expected behavioral impact rather than document order.",
"max_score": 8
},
{
"name": "Specific evidence",
"description": "Each major finding cites a specific phrase, section, or line from the original prompt instead of making only generic claims.",
"max_score": 10
},
{
"name": "Concrete rewrite blocks",
"description": "The response includes replacement prompt text for weak areas, not only descriptions of what should be improved.",
"max_score": 10
},
{
"name": "Tool definition diagnosis",
"description": "The audit flags that the tool list lacks descriptions, parameter guidance, use conditions, result-handling instructions, or overlap guidance.",
"max_score": 8
},
{
"name": "Retrieval failure handling",
"description": "The audit identifies the instruction to answer from general knowledge after no search results as a risk and replaces it with deterministic uncertainty or recovery behavior.",
"max_score": 8
},
{
"name": "Output format diagnosis",
"description": "The audit flags that 'Answer in bullets' is insufficient and proposes a more reliable response structure with evidence, uncertainty, and next steps.",
"max_score": 8
},
{
"name": "Constraint framing",
"description": "The audit identifies overuse or weak use of vague never/always constraints and suggests clearer positive instructions plus reserved safety-critical prohibitions.",
"max_score": 8
},
{
"name": "Priority labels",
"description": "Every detailed finding has a high, medium, or low priority label.",
"max_score": 6
},
{
"name": "Open Questions",
"description": "The response includes an '## Open Questions' section or equivalent listing missing information that would affect the final prompt design.",
"max_score": 6
},
{
"name": "No vague findings",
"description": "Findings are specific and actionable; the response avoids vague advice like 'make it better' without an exact fix.",
"max_score": 8
}
]
}