AI-first application patterns, LLM testing, prompt management
40
39%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/llm-patterns/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a list of topic tags rather than a functional skill description. It lacks concrete actions, explicit trigger guidance, and sufficient specificity to distinguish it from other AI-related skills. It would be very difficult for Claude to reliably select this skill from a pool of similar options.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Guides building AI-first applications, writing and managing prompts, and setting up LLM evaluation frameworks.'
Add an explicit 'Use when...' clause with trigger terms, e.g., 'Use when the user asks about building apps with LLMs, testing model outputs, writing evals, prompt engineering, or structuring AI-powered workflows.'
Include common natural-language variations of key terms (e.g., 'evals', 'prompt engineering', 'AI app architecture', 'model evaluation') to improve trigger term coverage and distinctiveness.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lists broad topic areas ('AI-first application patterns', 'LLM testing', 'prompt management') but does not describe any concrete actions. There are no verbs indicating what the skill actually does. | 1 / 3 |
Completeness | The description weakly addresses 'what' (topic areas only, no actions) and completely lacks any 'when' guidance. There is no 'Use when...' clause or equivalent trigger guidance. | 1 / 3 |
Trigger Term Quality | Terms like 'LLM testing', 'prompt management', and 'AI-first' are somewhat relevant keywords a user might mention, but they miss common variations (e.g., 'prompt engineering', 'evaluations', 'evals', 'AI app development', 'model testing') and are fairly broad. | 2 / 3 |
Distinctiveness Conflict Risk | The terms are very broad and could overlap with many AI/ML-related skills. 'AI-first application patterns' and 'prompt management' are generic enough to conflict with coding skills, AI documentation skills, or prompt engineering skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable skill with excellent executable code examples covering the full lifecycle of LLM-integrated applications. Its main weaknesses are moderate verbosity (some sections explain concepts Claude already knows), a lack of explicit error handling/validation workflows for LLM calls, and a monolithic structure that could benefit from splitting into referenced sub-files for testing and CI patterns.
Suggestions
Add explicit error handling workflow with validation checkpoints: what to do when JSON parsing fails, when schema validation fails, retry strategies with backoff — show concrete code for these recovery paths.
Split the testing section (mocks, fixtures, evals) and CI configuration into a separate TESTING.md file, referenced from the main skill, to reduce the monolithic structure.
Remove or significantly trim the 'Core Principle' section — Claude already understands when to use LLMs vs traditional code; a single-line reminder suffices.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly well-structured but includes some unnecessary verbosity. The 'Core Principle' section explaining what LLMs vs code should handle is somewhat obvious to Claude. The project structure tree, while useful, is lengthy. The anti-patterns list at the end restates things already demonstrated in the code examples. However, the code examples themselves are lean and purposeful. | 2 / 3 |
Actionability | The skill provides fully executable TypeScript code examples throughout — typed LLM wrapper, Zod schemas, prompt templates, three tiers of testing, GitHub Actions YAML, and cost tracking. All examples are copy-paste ready with realistic implementations and concrete patterns. | 3 / 3 |
Workflow Clarity | The skill presents a clear three-tier testing strategy (mocks → fixtures → evals) with good sequencing, and the CI workflow distinguishes fast vs nightly runs. However, there are no explicit validation checkpoints or error recovery steps for the LLM call workflow itself — e.g., what to do when schema parsing fails, retry logic, or how to handle malformed JSON responses. The anti-patterns mention handling failures but don't show how. | 2 / 3 |
Progressive Disclosure | The content is well-organized with clear section headers and logical progression from client setup to testing to CI. However, at ~250 lines this is a monolithic document that could benefit from splitting detailed testing patterns and CI configuration into separate referenced files. No bundle files exist to offload content, and no references to external files are provided. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
7e5f7a2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.