CtrlK
BlogDocsLog inGet started
Tessl Logo

senior-prompt-engineer

This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.

76

1.18x
Quality

53%

Does it follow best practices?

Impact

91%

1.18x

Average score across 6 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./engineering-team/senior-prompt-engineer/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

64%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at providing trigger terms that users would naturally use, but it conflates trigger phrases with capability descriptions — it tells Claude when to use the skill but not clearly what the skill actually does. The scope is also very broad, covering multiple distinct AI/LLM domains, which increases conflict risk with more specialized skills.

Suggestions

Add explicit capability statements describing what the skill produces or does, e.g., 'Generates optimized prompt templates, designs evaluation rubrics for LLM outputs, architects multi-agent systems, and structures RAG pipelines.'

Consider narrowing the scope or adding clearer boundaries to reduce overlap risk with potential specialized skills for RAG, agent architectures, or LLM evaluation.

Rewrite to use third-person active voice for capabilities (e.g., 'Designs prompt templates and evaluates LLM outputs') rather than relying solely on quoted user phrases.

DimensionReasoningScore

Specificity

The description names the domain (prompt engineering, LLM evaluation, agentic systems) and lists several actions like 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', but these are mostly listed as trigger phrases rather than concrete capabilities the skill performs. It doesn't clearly describe what the skill actually does (e.g., 'Generates optimized prompt templates', 'Builds evaluation rubrics').

2 / 3

Completeness

The 'when' is well-covered with explicit trigger phrases and a 'Use for' clause. However, the 'what does this do' part is weak — it lists trigger phrases but doesn't clearly describe the skill's concrete capabilities or outputs. The description reads more like a list of triggers than a balanced what+when explanation.

2 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'optimize prompts', 'design prompt templates', 'evaluate LLM outputs', 'build agentic systems', 'implement RAG', 'create few-shot examples', 'analyze token usage', 'design AI workflows'. These are realistic phrases users would naturally use.

3 / 3

Distinctiveness Conflict Risk

The scope is quite broad, covering prompt engineering, RAG, agentic systems, evaluation frameworks, and AI workflows. This breadth means it could overlap with more specialized skills focused on any one of these areas (e.g., a dedicated RAG skill or an agent architecture skill).

2 / 3

Total

9

/

12

Passed

Implementation

42%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill is well-structured with good progressive disclosure and clear navigation, but suffers significantly from verbosity. Large blocks of hypothetical tool output, ASCII diagrams, and explanations of concepts Claude already knows (prompt patterns, RAG evaluation metrics) inflate the token count without adding proportional value. The actionability is moderate—commands look concrete but depend on scripts whose existence is uncertain, and some workflow steps remain vague.

Suggestions

Remove or drastically reduce the sample output blocks (e.g., the full ASCII agent workflow diagram, detailed RAG evaluation output) — a one-line description of what the output contains is sufficient for Claude.

Delete the 'Common Patterns Quick Reference' table entirely — Claude already knows zero-shot, few-shot, chain-of-thought, role prompting, and structured output patterns.

Add explicit error handling/feedback loops to workflows, e.g., 'If optimization reduces quality on test cases, revert and try a different pattern from the table in Step 3.'

Clarify whether the referenced scripts actually exist in the project, or if they are aspirational — if they don't exist, provide inline Python code that accomplishes the same tasks.

DimensionReasoningScore

Conciseness

The skill is extremely verbose at ~300+ lines. Much of the content is hypothetical tool output that Claude doesn't need spelled out (e.g., full ASCII art diagrams of agent workflows, detailed sample output blocks). The tool descriptions explain concepts Claude already understands (what RAG is, what few-shot learning is). The common patterns quick reference table teaches Claude things it already knows well.

1 / 3

Actionability

The commands are concrete and copy-paste ready, but they reference scripts (prompt_optimizer.py, rag_evaluator.py, agent_orchestrator.py) that may not actually exist in the project. The extensive sample outputs are illustrative but not executable. The workflows provide steps but mix concrete commands with vague instructions like 'Review the analysis report for...' and 'Run both prompts against your evaluation set.'

2 / 3

Workflow Clarity

The three workflows (Prompt Optimization, Few-Shot Design, Structured Output) have clear numbered steps and logical sequencing. However, validation steps are present but rely on assumed tooling. The prompt optimization workflow has a good compare step but lacks explicit error recovery or feedback loops (e.g., what to do if optimization degrades quality). No destructive operations are involved, but the validation is mostly 'run this script' without handling failure cases.

2 / 3

Progressive Disclosure

The skill has a clear table of contents, well-organized sections, and a reference documentation table that clearly signals when to load additional files (references/prompt_engineering_patterns.md, etc.) with specific trigger phrases. Content is appropriately split between the overview and referenced files, with one-level-deep navigation.

3 / 3

Total

8

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
alirezarezvani/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.