senior-prompt-engineer

This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.

1.18x

Quality

53%

Does it follow best practices?

Impact

91%

1.18x

Average score across 6 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-prompt-engineer/SKILL.md

Quality

Discovery

64%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at providing trigger terms that users would naturally use, giving strong keyword coverage for skill selection. However, it reads more like a list of trigger phrases than a proper skill description — it lacks a clear statement of what the skill actually does (its concrete capabilities and outputs). The broad scope covering prompt engineering, RAG, agents, and evaluation could cause conflicts with more specialized skills.

Suggestions

Add a clear opening statement describing what the skill concretely does, e.g., 'Generates optimized prompt templates, designs agent architectures, builds evaluation rubrics, and structures RAG pipelines for LLM applications.'

Consider narrowing the scope or explicitly delineating sub-capabilities to reduce overlap risk with potential specialized skills for RAG, agents, or evaluation.

Dimension	Reasoning	Score
Specificity	The description names the domain (prompt engineering, LLM evaluation, agentic systems) and lists several actions like 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', but these are mostly listed as trigger phrases rather than concrete capabilities the skill performs. It doesn't clearly state what the skill actually does (e.g., 'Generates optimized prompt templates, builds evaluation rubrics, designs agent architectures').	2 / 3
Completeness	The 'when' is well-covered with explicit trigger phrases and a 'Use for' clause. However, the 'what does this do' part is weak — the description lists trigger terms but doesn't clearly explain what concrete outputs or actions the skill produces. The 'what' is implied through the trigger terms rather than explicitly stated.	2 / 3
Trigger Term Quality	Excellent coverage of natural terms users would say: 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', 'build agentic systems', 'implement RAG', 'create few-shot examples', 'analyze token usage', 'design AI workflows'. These are all phrases a user would naturally use when seeking this kind of help.	3 / 3
Distinctiveness Conflict Risk	The scope is quite broad, covering prompt engineering, RAG, agentic systems, evaluation frameworks, and AI workflows. While the domain is specific to AI/LLM work, the breadth means it could overlap with more specialized skills for RAG implementation, agent design, or evaluation separately. Terms like 'AI workflows' are also somewhat generic.	2 / 3
	Total	9 / 12 Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is well-structured with good progressive disclosure and clear navigation, but suffers significantly from verbosity—particularly the extensive hypothetical tool outputs and ASCII diagrams that consume many tokens without adding proportional value. The actionability is undermined by referencing scripts that appear fictional, and the workflows lack the validation/feedback loops needed for robust prompt engineering iteration.

Suggestions

Remove or drastically reduce the sample output blocks (especially the ASCII agent visualization) — show only the key output fields Claude needs to understand, not full mock terminal output.

Remove the Common Patterns Quick Reference table entirely — Claude already knows zero-shot, few-shot, CoT, and role prompting patterns intimately.

Add explicit validation checkpoints and error recovery steps to the workflows (e.g., 'If structured output fails JSON validation: add stricter format enforcement and retry').

Clarify whether the referenced scripts (prompt_optimizer.py, rag_evaluator.py, agent_orchestrator.py) actually exist in the project, or reframe the skill around actions Claude can take directly without fictional tooling.

Dimension	Reasoning	Score
Conciseness	The skill is extremely verbose at ~300+ lines. Much of the content is hypothetical tool output that Claude doesn't need spelled out (e.g., the full ASCII art agent visualization, detailed sample output for every command). The common patterns quick reference table explains concepts Claude already knows well (zero-shot, few-shot, chain-of-thought). The table of contents itself consumes tokens unnecessarily for a skill file.	1 / 3
Actionability	The commands and workflows appear concrete with specific CLI invocations, but the tools referenced (prompt_optimizer.py, rag_evaluator.py, agent_orchestrator.py) appear to be fictional/hypothetical scripts with fabricated output. There's no indication these actually exist in the project. The few-shot example design and structured output sections provide useful templates but are more instructional than executable.	2 / 3
Workflow Clarity	The three workflows (Prompt Optimization, Few-Shot Design, Structured Output) have clear numbered steps, but lack explicit validation checkpoints and error recovery loops. Step 6 of the optimization workflow ('Validate with test cases') is vague. There's no feedback loop for what to do when validation fails or outputs don't match expectations.	2 / 3
Progressive Disclosure	The skill has a clear overview structure with a table of contents, quick start section, and a well-organized reference documentation table pointing to three separate reference files with clear loading triggers. Content is appropriately split between the main skill and reference documents.	3 / 3
	Total	8 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: alirezarezvani/claude-skills
Commit: f567c61

Reviewed: 10 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.