AI-first application patterns, LLM testing, prompt management
51
39%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/llm-patterns/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is essentially a comma-separated list of broad topic areas with no concrete actions, no explicit trigger guidance, and high overlap risk with other AI-related skills. It fails to communicate what specific tasks the skill performs or when Claude should select it, making it poorly suited for skill selection among many candidates.
Suggestions
Add a 'Use when...' clause with explicit trigger scenarios, e.g., 'Use when the user asks about building AI-powered applications, writing evals for LLM outputs, or managing prompt templates and versions.'
Replace abstract topic labels with concrete actions, e.g., 'Designs AI-first application architectures, creates evaluation suites for LLM outputs, versions and organizes prompt templates.'
Include natural keyword variations users might say, such as 'evals', 'prompt engineering', 'prompt versioning', 'model evaluation', 'AI app architecture', '.prompt files'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lists broad topic areas ('AI-first application patterns', 'LLM testing', 'prompt management') but does not describe any concrete actions. These are abstract domain labels, not specific capabilities like 'generate test cases for LLM outputs' or 'version and organize prompt templates'. | 1 / 3 |
Completeness | The description only loosely addresses 'what' (and even that is vague topic labels rather than actions). There is no 'when' clause or explicit trigger guidance whatsoever, which per the rubric caps completeness at 2 at best, and the weak 'what' brings it to 1. | 1 / 3 |
Trigger Term Quality | Terms like 'LLM testing', 'prompt management', and 'AI-first' are somewhat relevant keywords a user might mention, but they miss many natural variations (e.g., 'evaluate model outputs', 'prompt engineering', 'prompt versioning', 'AI app development', 'evals'). Coverage is partial. | 2 / 3 |
Distinctiveness Conflict Risk | The terms are very broad and could overlap with many skills related to AI development, testing frameworks, or prompt engineering. 'AI-first application patterns' is especially generic and could conflict with any AI/ML-related skill. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable skill with excellent executable code examples covering LLM client patterns, prompt management, testing strategies, and CI integration. Its main weaknesses are verbosity (some sections could be trimmed or split into referenced files) and the lack of explicit error handling workflows and validation checkpoints in the multi-step processes. The anti-patterns section is a nice touch but partially redundant with the demonstrated patterns.
Suggestions
Add explicit error handling and retry workflows for LLM calls (try/catch around JSON.parse, schema validation failure recovery, timeout handling) to improve workflow clarity.
Split detailed sections (testing patterns, CI config, cost tracking) into separate referenced files to improve progressive disclosure and reduce the monolithic nature of the document.
Trim the 'Core Principle' section — Claude doesn't need to be told what LLMs vs traditional code are good at; focus on the project-specific conventions instead.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly well-structured but includes some content Claude already knows (e.g., what to use LLMs vs code for is fairly obvious to Claude). The project structure section is verbose and could be trimmed. The anti-patterns list at the end is somewhat redundant given the patterns already shown. However, the code examples are dense and informative. | 2 / 3 |
Actionability | The skill provides fully executable TypeScript code examples throughout — typed LLM wrapper, Zod schemas, prompt templates, test files at all three levels (mocks, fixtures, evals), GitHub Actions YAML, and cost tracking. All examples are copy-paste ready with realistic implementations. | 3 / 3 |
Workflow Clarity | The skill presents a clear testing pyramid (unit → fixture → eval) and CI workflow, but lacks explicit validation checkpoints or error recovery steps. For example, the LLM client pattern shows JSON.parse without try/catch, and there's no explicit workflow for 'what to do when schema validation fails' or retry logic. The overall sequence of building an LLM-powered feature is implicit rather than explicitly sequenced. | 2 / 3 |
Progressive Disclosure | The content is well-sectioned with clear headers, but it's a monolithic document (~200+ lines of code examples) that could benefit from splitting detailed patterns (testing, CI config, cost tracking) into separate reference files. The header mentions 'Load with: base.md + [language].md' suggesting a multi-file system exists, but no cross-references to other files are provided within the body. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
d4ddb03
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.