AI-first application patterns, LLM testing, prompt management
40
39%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/llm-patterns/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a list of three broad topic labels with no concrete actions, no explicit trigger guidance, and no clear niche. It reads more like a set of tags than a functional skill description, making it very difficult for Claude to know when to select this skill over others.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Guides building AI-first applications, writing and managing prompts, and setting up LLM evaluation pipelines.'
Add an explicit 'Use when...' clause with trigger terms, e.g., 'Use when the user asks about building apps with LLMs, testing AI outputs, writing evals, prompt engineering, or prompt versioning.'
Include common natural-language variations users might say, such as 'evals', 'prompt engineering', 'AI app architecture', 'LLM evaluation', 'prompt versioning', to improve trigger term coverage and distinctiveness.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lists broad topic areas ('AI-first application patterns', 'LLM testing', 'prompt management') but does not describe any concrete actions. There are no verbs indicating what the skill actually does. | 1 / 3 |
Completeness | The description weakly addresses 'what' (topic areas only, no actions) and completely omits 'when' — there is no 'Use when...' clause or any explicit trigger guidance. | 1 / 3 |
Trigger Term Quality | Terms like 'LLM testing', 'prompt management', and 'AI-first' are somewhat relevant keywords a user might mention, but they lack common variations (e.g., 'prompt engineering', 'evaluations', 'evals', 'AI app development') and are fairly broad. | 2 / 3 |
Distinctiveness Conflict Risk | The terms are very broad and could overlap with many AI/ML-related skills. 'AI-first application patterns' and 'prompt management' are generic enough to conflict with coding skills, prompt engineering skills, or general AI development skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable skill with excellent executable code examples covering LLM client patterns, prompt management, testing strategies, and CI integration. Its main weaknesses are moderate verbosity (some sections explain concepts Claude already knows), lack of explicit error handling/validation workflows for LLM operations, and a monolithic structure that would benefit from splitting into referenced sub-documents.
Suggestions
Add explicit error handling workflows: show what happens when JSON parsing fails, when schema validation rejects a response, and how to implement retry/fallback logic — this is critical for LLM operations.
Split the file into a concise overview SKILL.md with references to separate files for testing patterns (TESTING.md), CI configuration (CI.md), and the cost tracking module.
Trim the 'Core Principle' section — Claude already knows when to use LLMs vs traditional code. A single sentence like 'LLM for classification/extraction/generation; code for validation/routing/auth' suffices.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly well-structured but includes some unnecessary verbosity. The 'Core Principle' section explaining what LLMs vs code should handle is somewhat obvious to Claude. The project structure tree and some of the inline comments add bulk without proportional value. However, the code examples themselves are reasonably tight. | 2 / 3 |
Actionability | The skill provides fully executable TypeScript code examples throughout — typed LLM wrapper, Zod schemas, prompt templates, three tiers of testing, GitHub Actions YAML, and cost tracking. All examples are copy-paste ready with realistic implementations. | 3 / 3 |
Workflow Clarity | The testing section presents a clear three-tier strategy (mocks → fixtures → evals) with good sequencing, and the CI workflow shows when each runs. However, there are no explicit validation checkpoints or error recovery steps for the core LLM call workflow (e.g., what to do when schema parsing fails, retry logic, fallback behavior). The anti-patterns section mentions handling failures but doesn't show how. | 2 / 3 |
Progressive Disclosure | The content is well-organized with clear section headers, but it's a monolithic ~250-line file with no references to external files. The project structure suggests files like prompts/, schemas, and services that could be separate reference documents. Testing patterns, CI config, and cost tracking could each be split out for better navigation. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
65efb33
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.