Writes, refactors, and evaluates prompts for LLMs — generating optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Use when designing prompts for new LLM applications, refactoring existing prompts for better accuracy or token efficiency, implementing chain-of-thought or few-shot learning, creating system prompts with personas and guardrails, building JSON/function-calling schemas, or developing prompt evaluation frameworks to measure and improve model performance.
64
75%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/prompt-engineer/SKILL.mdQuality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines a specific domain (prompt engineering for LLMs), lists concrete capabilities, and provides comprehensive trigger guidance via an explicit 'Use when...' clause with multiple natural scenarios. It uses proper third-person voice throughout and covers both common and advanced prompt engineering concepts, making it highly distinguishable from other skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: writes/refactors/evaluates prompts, generates optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Very detailed and actionable. | 3 / 3 |
Completeness | Clearly answers both 'what' (writes, refactors, evaluates prompts, generates templates/schemas/rubrics/test suites) and 'when' with an explicit 'Use when...' clause listing six specific trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'prompts', 'LLM', 'chain-of-thought', 'few-shot learning', 'system prompts', 'JSON', 'function-calling', 'token efficiency', 'evaluation', 'personas', 'guardrails'. These are terms prompt engineers and developers naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Occupies a clear niche around prompt engineering specifically. The triggers are distinct — terms like 'prompt templates', 'chain-of-thought', 'few-shot learning', 'system prompts', and 'prompt evaluation frameworks' are unlikely to conflict with general coding or document skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
50%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a competent but somewhat generic prompt engineering skill that provides useful examples (zero-shot vs few-shot, before/after optimization) but lacks the concrete, executable depth needed for high scores. The workflow is reasonable but abstract, the constraints list includes obvious best practices, and the progressive disclosure structure references files that don't exist in the bundle. The skill would benefit from more specific, actionable guidance and actual supporting reference files.
Suggestions
Add concrete evaluation examples — show a specific test suite structure, scoring rubric template, or evaluation script rather than just describing what to deliver in the Output Templates section.
Trim the MUST DO/MUST NOT DO lists to only prompt-engineering-specific guidance that Claude wouldn't already know, removing general software engineering advice like 'version prompts and track changes systematically'.
Provide the referenced bundle files (e.g., references/prompt-patterns.md, references/evaluation-frameworks.md) or remove the reference table — currently it promises depth that doesn't exist.
Make the validation checkpoint in the workflow more concrete: specify how to measure accuracy (e.g., a simple scoring script template), what constitutes a good test set size, and provide an example of a failure pattern analysis.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient but includes some unnecessary content. The 'When to Use This Skill' section largely duplicates the description, the 'Coverage Note' at the bottom is filler, and some constraint items state obvious best practices Claude already knows (e.g., 'Version prompts and track changes systematically'). The MUST DO/MUST NOT DO lists are somewhat verbose with items that are general software engineering wisdom rather than prompt-engineering-specific guidance. | 2 / 3 |
Actionability | The before/after prompt examples and zero-shot vs few-shot comparisons are concrete and useful. However, the core workflow is fairly high-level and abstract ('Understand requirements', 'Design initial prompt'), and the skill lacks executable code or commands for evaluation, testing, or schema validation. The output templates section describes what to deliver but doesn't provide actual templates or schemas. | 2 / 3 |
Workflow Clarity | The 5-step core workflow is clearly sequenced and includes a validation checkpoint at step 3 with a specific threshold (accuracy < 80%) and failure pattern identification. However, the validation is somewhat vague — there's no concrete guidance on how to measure accuracy, what tools to use, or how to structure the feedback loop. For a skill involving iterative refinement and evaluation, the workflow could benefit from more explicit checkpoints and concrete validation steps. | 2 / 3 |
Progressive Disclosure | The reference table is well-structured with clear 'Load When' guidance and one-level-deep references to six topic-specific files. However, no bundle files are provided, meaning none of these references actually exist, which significantly undermines the progressive disclosure structure. The main SKILL.md also includes substantial inline content (examples, constraints, output templates) that could arguably be split into references, making the file longer than necessary. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
e8be415
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.