Name: sharaf/agent-prompt-engineer
Rating: 100 (1 reviews)
Author: sharaf

sharaf/agent-prompt-engineer

Write or audit AI agent system prompts component-by-component across identity, instruction architecture, behavioral constraints, tools, examples, context strategy, output format, and error handling. Use when the user wants to design a new agent prompt, write a system prompt, review an existing agent prompt, fix tool-use instructions, audit prompt structure, improve context strategy, tune output formats, or define error handling for single-agent or multi-agent systems.

100

1.33x

Quality

100%

Does it follow best practices?

Impact

100%

1.33x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether write mode produces a complete agent system prompt with required first actions, design decisions, scoped persona, tool descriptions, context strategy, output contract, error handling, escalation, and iteration guidance.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "First Actions section",
      "description": "The response starts with or clearly includes a filled '## First Actions' section that states write mode and summarizes checked inputs before writing the prompt.",
      "max_score": 8
    },
    {
      "name": "Design Decisions section",
      "description": "The response includes design notes or a '## Design Decisions' section explaining key choices such as persona depth, Claude-oriented structure, tool handling, constraint framing, context handling, and output format.",
      "max_score": 8
    },
    {
      "name": "System Prompt section",
      "description": "The response includes a clearly separated '## System Prompt' section containing the generated prompt text, not only advice about how to write one.",
      "max_score": 10
    },
    {
      "name": "Scope boundaries",
      "description": "The generated prompt defines what the agent handles and what is out of scope, including escalation or refusal behavior for non-billing requests.",
      "max_score": 8
    },
    {
      "name": "Tool definitions",
      "description": "Each of the four tools is described with when to use it, what its parameters mean, and a caveat or result-handling instruction.",
      "max_score": 10
    },
    {
      "name": "Escalation triggers",
      "description": "The generated prompt lists concrete escalation triggers including high-dollar refunds, legal or chargeback threats, security incidents, and repeated failed attempts.",
      "max_score": 8
    },
    {
      "name": "Tool failure handling",
      "description": "The generated prompt instructs the agent not to treat empty or failed tool results as definitive proof of absence, and gives a recovery or escalation path.",
      "max_score": 8
    },
    {
      "name": "Context strategy",
      "description": "The response addresses multi-turn context handling, including summary or handoff context and keeping static prompt content separate from dynamic conversation/tool data.",
      "max_score": 8
    },
    {
      "name": "Output format",
      "description": "The generated prompt specifies the customer-facing response format for normal answers and escalations, using Markdown or another clear structure.",
      "max_score": 8
    },
    {
      "name": "Critical rule sandwich",
      "description": "Critical safety, scope, or escalation rules appear near both the beginning and the end of the generated prompt or are explicitly repeated as final instructions.",
      "max_score": 8
    },
    {
      "name": "How to Iterate",
      "description": "The response includes a '## How to Iterate' section or equivalent with evaluation guidance such as golden cases, deterministic graders, LLM judges, one-variable changes, and stop criteria.",
      "max_score": 8
    },
    {
      "name": "Positive framing",
      "description": "Non-safety constraints are primarily framed as actions to take rather than only as negative prohibitions; absolute negative language is reserved for high-risk restrictions.",
      "max_score": 6
    }
  ]
}

evals

scenario-1

criteria.json

task.md

scenario-2

scenario-3

README.md

SKILL.md

tile.json

sharaf/agent-prompt-engineer

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

criteria.jsonevals/scenario-1/