This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.
76
53%
Does it follow best practices?
Impact
91%
1.18xAverage score across 6 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./engineering-team/senior-prompt-engineer/SKILL.mdQuality
Discovery
64%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description excels at providing trigger terms that users would naturally use, but it conflates trigger phrases with capability descriptions — it tells Claude when to use the skill but not clearly what the skill actually does. The scope is also very broad, covering multiple distinct AI/LLM domains, which increases conflict risk with more specialized skills.
Suggestions
Add explicit capability statements describing what the skill produces or does, e.g., 'Generates optimized prompt templates, designs evaluation rubrics for LLM outputs, architects multi-agent systems, and structures RAG pipelines.'
Consider narrowing the scope or adding clearer boundaries to reduce overlap risk with potential specialized skills for RAG, agent architectures, or LLM evaluation.
Rewrite to use third-person active voice for capabilities (e.g., 'Designs prompt templates and evaluates LLM outputs') rather than relying solely on quoted user phrases.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (prompt engineering, LLM evaluation, agentic systems) and lists several actions like 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', but these are mostly listed as trigger phrases rather than concrete capabilities the skill performs. It doesn't clearly describe what the skill actually does (e.g., 'Generates optimized prompt templates', 'Builds evaluation rubrics'). | 2 / 3 |
Completeness | The 'when' is well-covered with explicit trigger phrases and a 'Use for' clause. However, the 'what does this do' part is weak — it lists trigger phrases but doesn't clearly describe the skill's concrete capabilities or outputs. The description reads more like a list of triggers than a balanced what+when explanation. | 2 / 3 |
Trigger Term Quality | Excellent coverage of natural terms users would say: 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', 'build agentic systems', 'implement RAG', 'create few-shot examples', 'analyze token usage', 'design AI workflows'. These are realistic phrases users would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | The scope is quite broad, covering prompt engineering, RAG, agentic systems, evaluation frameworks, and AI workflows. This breadth means it could overlap with more specialized skills focused on any one of these areas (e.g., a dedicated RAG skill or an agent architecture skill). | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
42%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill is well-structured with good progressive disclosure and clear navigation, but suffers significantly from verbosity. Large blocks of hypothetical tool output, ASCII diagrams, and explanations of concepts Claude already knows (prompt patterns, RAG evaluation metrics) inflate the token count without adding proportional value. The actionability is moderate—commands look concrete but depend on scripts whose existence is uncertain, and some workflow steps remain vague.
Suggestions
Remove or drastically reduce the sample output blocks (e.g., the full ASCII agent workflow diagram, detailed RAG evaluation output) — a one-line description of what the output contains is sufficient for Claude.
Delete the 'Common Patterns Quick Reference' table entirely — Claude already knows zero-shot, few-shot, chain-of-thought, role prompting, and structured output patterns.
Add explicit error handling/feedback loops to workflows, e.g., 'If optimization reduces quality on test cases, revert and try a different pattern from the table in Step 3.'
Clarify whether the referenced scripts actually exist in the project, or if they are aspirational — if they don't exist, provide inline Python code that accomplishes the same tasks.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~300+ lines. Much of the content is hypothetical tool output that Claude doesn't need spelled out (e.g., full ASCII art diagrams of agent workflows, detailed sample output blocks). The tool descriptions explain concepts Claude already understands (what RAG is, what few-shot learning is). The common patterns quick reference table teaches Claude things it already knows well. | 1 / 3 |
Actionability | The commands are concrete and copy-paste ready, but they reference scripts (prompt_optimizer.py, rag_evaluator.py, agent_orchestrator.py) that may not actually exist in the project. The extensive sample outputs are illustrative but not executable. The workflows provide steps but mix concrete commands with vague instructions like 'Review the analysis report for...' and 'Run both prompts against your evaluation set.' | 2 / 3 |
Workflow Clarity | The three workflows (Prompt Optimization, Few-Shot Design, Structured Output) have clear numbered steps and logical sequencing. However, validation steps are present but rely on assumed tooling. The prompt optimization workflow has a good compare step but lacks explicit error recovery or feedback loops (e.g., what to do if optimization degrades quality). No destructive operations are involved, but the validation is mostly 'run this script' without handling failure cases. | 2 / 3 |
Progressive Disclosure | The skill has a clear table of contents, well-organized sections, and a reference documentation table that clearly signals when to load additional files (references/prompt_engineering_patterns.md, etc.) with specific trigger phrases. Content is appropriately split between the overview and referenced files, with one-level-deep navigation. | 3 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
967fe01
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.