Writes, refactors, and evaluates prompts for LLMs — generating optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Use when designing prompts for new LLM applications, refactoring existing prompts for better accuracy or token efficiency, implementing chain-of-thought or few-shot learning, creating system prompts with personas and guardrails, building JSON/function-calling schemas, or developing prompt evaluation frameworks to measure and improve model performance.
83
78%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/prompt-engineer/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines its domain (prompt engineering for LLMs), lists specific concrete capabilities, and provides comprehensive trigger guidance via an explicit 'Use when...' clause with six distinct scenarios. It uses proper third-person voice throughout and includes highly relevant natural keywords that users would actually say when needing this skill.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: writes/refactors/evaluates prompts, generates optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Very detailed and actionable. | 3 / 3 |
Completeness | Clearly answers both 'what' (writes, refactors, evaluates prompts, generates templates/schemas/rubrics/test suites) and 'when' with an explicit 'Use when...' clause listing six specific trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'prompts', 'LLM', 'chain-of-thought', 'few-shot learning', 'system prompts', 'JSON', 'function-calling', 'token efficiency', 'evaluation', 'personas', 'guardrails'. These are terms prompt engineers and developers naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around prompt engineering specifically. The triggers are distinct — terms like 'prompt templates', 'chain-of-thought', 'few-shot learning', 'system prompts', and 'prompt evaluation frameworks' are unlikely to conflict with general coding or document skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-organized skill with strong progressive disclosure and useful concrete examples (zero-shot vs few-shot, before/after optimization). Its main weaknesses are moderate verbosity in sections that restate common knowledge (MUST DO/MUST NOT lists, 'When to Use' section) and a lack of truly executable evaluation/testing guidance — the workflow describes what to do conceptually but doesn't provide concrete tools or code for measuring prompt quality.
Suggestions
Trim the 'When to Use This Skill' section (it restates the description) and reduce the MUST DO/MUST NOT lists to only non-obvious, prompt-engineering-specific guidance.
Add a concrete, executable example for evaluation — e.g., a Python snippet that runs a prompt against test cases and calculates accuracy/consistency metrics.
Replace the abstract 'Output Templates' section with a concrete example of a complete deliverable (e.g., a filled-in prompt card with actual metrics and test results).
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes some unnecessary content that Claude already knows — e.g., the 'When to Use This Skill' section largely restates the description, the MUST DO/MUST NOT DO lists contain general software engineering wisdom (version control, don't hardcode secrets) that doesn't need to be spelled out, and the 'Coverage Note' section is filler. The examples earn their place, but overall it could be tightened. | 2 / 3 |
Actionability | The before/after prompt examples are concrete and useful, and the zero-shot vs few-shot comparison is copy-paste ready. However, the core workflow is fairly abstract (no specific commands or tools for evaluation, no concrete metric calculation code), and the output templates section describes what to deliver without showing a concrete example of a deliverable. The skill tells Claude what to do conceptually but lacks executable evaluation or testing code. | 2 / 3 |
Workflow Clarity | The 5-step core workflow is clearly sequenced and includes a validation checkpoint at step 3 with a specific threshold (accuracy < 80%) and failure pattern identification. However, the validation is somewhat vague — there's no concrete mechanism for how to measure accuracy or what tools to use. The feedback loop is mentioned but not fully operationalized with specific commands or scripts. | 2 / 3 |
Progressive Disclosure | Excellent use of a reference table with clear 'Load When' guidance for six separate reference files. The main skill provides a concise overview with examples, and detailed guidance is appropriately deferred to one-level-deep references. Navigation is clear and well-signaled. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
3d95bb1
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.