CtrlK
BlogDocsLog inGet started
Tessl Logo

prompt-engineer

Writes, refactors, and evaluates prompts for LLMs — generating optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Use when designing prompts for new LLM applications, refactoring existing prompts for better accuracy or token efficiency, implementing chain-of-thought or few-shot learning, creating system prompts with personas and guardrails, building JSON/function-calling schemas, or developing prompt evaluation frameworks to measure and improve model performance.

83

Quality

78%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/prompt-engineer/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its domain (prompt engineering for LLMs), lists specific concrete capabilities, and provides comprehensive trigger guidance via an explicit 'Use when...' clause with six distinct scenarios. It uses proper third-person voice throughout and includes highly relevant natural keywords that users would actually say when needing this skill.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: writes/refactors/evaluates prompts, generates optimized prompt templates, structured output schemas, evaluation rubrics, and test suites. Very detailed and actionable.

3 / 3

Completeness

Clearly answers both 'what' (writes, refactors, evaluates prompts, generates templates/schemas/rubrics/test suites) and 'when' with an explicit 'Use when...' clause listing six specific trigger scenarios.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'prompts', 'LLM', 'chain-of-thought', 'few-shot learning', 'system prompts', 'JSON', 'function-calling', 'token efficiency', 'evaluation', 'personas', 'guardrails'. These are terms prompt engineers and developers naturally use.

3 / 3

Distinctiveness Conflict Risk

Occupies a clear niche around prompt engineering specifically. The triggers are distinct — terms like 'prompt templates', 'chain-of-thought', 'few-shot learning', 'system prompts', and 'prompt evaluation frameworks' are unlikely to conflict with general coding or document skills.

3 / 3

Total

12

/

12

Passed

Implementation

57%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-organized skill with strong progressive disclosure and useful concrete examples (zero-shot vs few-shot, before/after optimization). Its main weaknesses are moderate verbosity in sections that restate common knowledge (MUST DO/MUST NOT lists, 'When to Use' section) and a lack of truly executable evaluation/testing guidance — the workflow describes what to do conceptually but doesn't provide concrete tools or code for measuring prompt quality.

Suggestions

Trim the 'When to Use This Skill' section (it restates the description) and reduce the MUST DO/MUST NOT lists to only non-obvious, prompt-engineering-specific guidance.

Add a concrete, executable example for evaluation — e.g., a Python snippet that runs a prompt against test cases and calculates accuracy/consistency metrics.

Replace the abstract 'Output Templates' section with a concrete example of a complete deliverable (e.g., a filled-in prompt card with actual metrics and test results).

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes some unnecessary content that Claude already knows — e.g., the 'When to Use This Skill' section largely restates the description, the MUST DO/MUST NOT DO lists contain general software engineering wisdom (version control, don't hardcode secrets) that doesn't need to be spelled out, and the 'Coverage Note' section is filler. The examples earn their place, but overall it could be tightened.

2 / 3

Actionability

The before/after prompt examples are concrete and useful, and the zero-shot vs few-shot comparison is copy-paste ready. However, the core workflow is fairly abstract (no specific commands or tools for evaluation, no concrete metric calculation code), and the output templates section describes what to deliver without showing a concrete example of a deliverable. The skill tells Claude what to do conceptually but lacks executable evaluation or testing code.

2 / 3

Workflow Clarity

The 5-step core workflow is clearly sequenced and includes a validation checkpoint at step 3 with a specific threshold (accuracy < 80%) and failure pattern identification. However, the validation is somewhat vague — there's no concrete mechanism for how to measure accuracy or what tools to use. The feedback loop is mentioned but not fully operationalized with specific commands or scripts.

2 / 3

Progressive Disclosure

Excellent use of a reference table with clear 'Load When' guidance for six separate reference files. The main skill provides a concise overview with examples, and detailed guidance is appropriately deferred to one-level-deep references. Navigation is clear and well-signaled.

3 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
jeffallan/claude-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.