Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.
55
55%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has a solid structure with an explicit 'Use when...' clause, which is good for completeness. However, it relies on somewhat vague and buzzword-heavy language ('Master advanced', 'maximize', 'reliability and controllability') rather than listing concrete, specific techniques. The trigger terms cover the basics but miss many natural variations users might employ.
Suggestions
Replace vague opener 'Master advanced prompt engineering techniques' with specific actions like 'Designs system prompts, structures few-shot examples, applies chain-of-thought patterns, and optimizes prompt templates for production LLM applications'.
Expand trigger terms in the 'Use when' clause to include natural user phrases like 'system prompt', 'few-shot examples', 'prompt design', 'prompt writing', 'chain of thought', or 'prompt debugging'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain ('prompt engineering') and some general actions ('optimizing prompts', 'improving LLM outputs', 'designing production prompt templates'), but lacks concrete specific actions like 'chain-of-thought structuring', 'few-shot example design', or 'system prompt crafting'. The word 'Master' is vague fluff rather than a concrete action. | 2 / 3 |
Completeness | Clearly answers both 'what' (advanced prompt engineering techniques for LLM performance, reliability, controllability) and 'when' with an explicit 'Use when...' clause covering optimizing prompts, improving LLM outputs, and designing production prompt templates. | 3 / 3 |
Trigger Term Quality | Includes some relevant keywords like 'prompt engineering', 'LLM', 'prompts', and 'production prompt templates', but misses many natural user terms like 'system prompt', 'few-shot', 'chain of thought', 'prompt design', 'prompt optimization', 'AI instructions', or 'prompt writing'. | 2 / 3 |
Distinctiveness Conflict Risk | The domain of 'prompt engineering' is fairly specific, but terms like 'improving LLM outputs' and 'maximize LLM performance' are broad enough to potentially overlap with skills related to LLM evaluation, fine-tuning, or general AI development workflows. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a prompt engineering textbook or course outline than an actionable skill file for Claude. It is heavily padded with general knowledge Claude already possesses (best practices, common pitfalls, success metrics), uses a fictional Python library in its primary code example, and lacks concrete, executable workflows with validation checkpoints. The content would benefit enormously from being reduced to ~30% of its current size, focusing only on novel patterns and executable guidance.
Suggestions
Cut 'Core Capabilities', 'Best Practices', 'Common Pitfalls', 'Success Metrics', and 'Next Steps' sections entirely — these are general knowledge Claude already has. Focus only on novel, project-specific patterns and conventions.
Replace the fictional 'prompt_optimizer' library example with real, executable code or concrete prompt templates that can be directly used (e.g., actual prompt strings with placeholders, not imaginary API calls).
Add a concrete iterative workflow with validation checkpoints, e.g.: '1. Write initial prompt → 2. Test on 3 representative inputs → 3. Evaluate outputs against criteria → 4. If failures: identify failure mode and adjust → 5. Re-test until passing'.
Move the detailed inline content (integration patterns, performance optimization) to the referenced files and keep SKILL.md as a concise overview with clear pointers.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and padded with information Claude already knows. Sections like 'Best Practices' ('Be Specific: Vague prompts produce inconsistent results'), 'Common Pitfalls', 'Success Metrics', and 'Core Capabilities' are all general knowledge that Claude possesses. The skill reads like a textbook chapter rather than actionable instructions. | 1 / 3 |
Actionability | Contains some code examples (Quick Start, RAG integration, validation), but the code references a fictional 'prompt_optimizer' library that isn't real or executable. Most content is descriptive lists and abstract advice rather than concrete, copy-paste-ready guidance. The patterns section provides some structure but lacks executable specificity. | 2 / 3 |
Workflow Clarity | No clear multi-step workflow with validation checkpoints. The 'Progressive Disclosure' pattern shows levels but isn't a workflow. The 'Next Steps' section is vague ('experiment with few-shot learning'). There's no iterative refinement loop with explicit validation steps despite the skill being about prompt optimization, which inherently requires feedback loops. | 1 / 3 |
Progressive Disclosure | References to external files are well-signaled at the bottom (references/, assets/, scripts/), which is good. However, the main file is a monolithic wall of text with extensive inline content that could be split into referenced files. The Core Capabilities section alone lists 25+ bullet points that should be in separate reference docs. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents