Name: sharaf/agent-prompt-engineer
Rating: 100 (1 reviews)
Author: sharaf

sharaf/agent-prompt-engineer

Write or audit AI agent system prompts component-by-component across identity, instruction architecture, behavioral constraints, tools, examples, context strategy, output format, and error handling. Use when the user wants to design a new agent prompt, write a system prompt, review an existing agent prompt, fix tool-use instructions, audit prompt structure, improve context strategy, tune output formats, or define error handling for single-agent or multi-agent systems.

100

1.33x

Quality

100%

Does it follow best practices?

Impact

100%

1.33x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether audit mode produces a structured prompt audit with first actions, component scores, prioritized findings, specific evidence, concrete rewrites, and open questions.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Audit title and First Actions",
      "description": "The response is clearly an agent prompt audit and includes a filled '## First Actions' section that states audit mode and identifies the existing prompt as the required artifact.",
      "max_score": 10
    },
    {
      "name": "Component Scores",
      "description": "The response includes a '## Component Scores' section or equivalent that scores each applicable component as pass, issue, or n/a.",
      "max_score": 10
    },
    {
      "name": "Highest-impact fixes",
      "description": "The response includes a prioritized list of the top fixes, ordered by expected behavioral impact rather than document order.",
      "max_score": 8
    },
    {
      "name": "Specific evidence",
      "description": "Each major finding cites a specific phrase, section, or line from the original prompt instead of making only generic claims.",
      "max_score": 10
    },
    {
      "name": "Concrete rewrite blocks",
      "description": "The response includes replacement prompt text for weak areas, not only descriptions of what should be improved.",
      "max_score": 10
    },
    {
      "name": "Tool definition diagnosis",
      "description": "The audit flags that the tool list lacks descriptions, parameter guidance, use conditions, result-handling instructions, or overlap guidance.",
      "max_score": 8
    },
    {
      "name": "Retrieval failure handling",
      "description": "The audit identifies the instruction to answer from general knowledge after no search results as a risk and replaces it with deterministic uncertainty or recovery behavior.",
      "max_score": 8
    },
    {
      "name": "Output format diagnosis",
      "description": "The audit flags that 'Answer in bullets' is insufficient and proposes a more reliable response structure with evidence, uncertainty, and next steps.",
      "max_score": 8
    },
    {
      "name": "Constraint framing",
      "description": "The audit identifies overuse or weak use of vague never/always constraints and suggests clearer positive instructions plus reserved safety-critical prohibitions.",
      "max_score": 8
    },
    {
      "name": "Priority labels",
      "description": "Every detailed finding has a high, medium, or low priority label.",
      "max_score": 6
    },
    {
      "name": "Open Questions",
      "description": "The response includes an '## Open Questions' section or equivalent listing missing information that would affect the final prompt design.",
      "max_score": 6
    },
    {
      "name": "No vague findings",
      "description": "Findings are specific and actionable; the response avoids vague advice like 'make it better' without an exact fix.",
      "max_score": 8
    }
  ]
}

evals

scenario-1

scenario-2

scenario-3

sharaf/agent-prompt-engineer

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/