Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability.
57
57%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly communicates both purpose and trigger conditions. Its main strength is the explicit 'Use when...' clause with good keyword coverage. Its weaknesses are that the capabilities are described at a category level rather than with concrete actions, and the broad scope covering multiple sub-domains increases potential overlap with more specialized skills.
Suggestions
Replace 'Covers' with specific concrete actions, e.g., 'Implements vector-store-backed RAG pipelines, designs multi-step agent workflows, configures LLMOps dashboards and prompt versioning'
Narrow the scope or add distinguishing detail to reduce overlap risk, e.g., specify which frameworks or patterns are covered (LangChain, LlamaIndex, etc.) to differentiate from generic AI development skills
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (LLM applications) and several areas (RAG pipelines, agent architectures, prompt IDEs, LLMOps monitoring), but these are high-level categories rather than concrete actions. It says 'covers' rather than listing specific actionable tasks like 'implement retrieval-augmented generation with vector stores' or 'configure observability dashboards'. | 2 / 3 |
Completeness | Clearly answers both 'what' (production-ready patterns for LLM apps covering RAG, agents, prompt IDEs, LLMOps) and 'when' with an explicit 'Use when...' clause listing four trigger scenarios: designing AI applications, implementing RAG, building agents, or setting up LLM observability. | 3 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'RAG', 'agent architectures', 'LLM applications', 'AI applications', 'agents', 'LLM observability', 'prompt IDEs', 'LLMOps'. These cover a good range of terms a user building LLM-powered systems would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | While the LLM application focus is somewhat specific, terms like 'AI applications' and 'building agents' are broad enough to potentially overlap with general coding skills, AI/ML skills, or more specific agent-building skills. The scope is wide (RAG + agents + monitoring + prompt IDEs) which increases conflict risk with more specialized skills. | 2 / 3 |
Total | 10 / 12 Passed |
Implementation
14%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads like a textbook chapter on LLM application patterns rather than actionable guidance for Claude. It is extremely verbose, explaining well-known concepts (RAG, ReAct, function calling) that Claude already understands, while providing pseudocode patterns that aren't executable against any real framework. The lack of validation steps, error handling workflows, and content organization makes this poorly suited as a skill file.
Suggestions
Reduce content by 80%+ — remove explanations of concepts Claude knows (what RAG is, what agents are) and focus only on project-specific conventions, preferred libraries, and concrete configuration values.
Make code examples executable by choosing a specific stack (e.g., LangChain + ChromaDB + OpenAI) and providing complete, runnable snippets with imports and setup.
Add validation checkpoints to workflows, e.g., 'After ingestion, verify chunk count and sample embeddings; after retrieval, check relevance scores before generation.'
Split into multiple files: keep SKILL.md as a concise overview with decision matrix, and move RAG details, agent patterns, prompt IDE patterns, and LLMOps into separate referenced files.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~500+ lines. Explains concepts Claude already knows well (RAG, ReAct, function calling, prompt chaining). Much of this is textbook-level LLM application knowledge that doesn't need to be spelled out. The vector DB comparison tables, embedding model listings, and metric dictionaries are reference material that bloats the context without adding novel insight. | 1 / 3 |
Actionability | Code examples are present throughout but are largely pseudocode-level patterns rather than truly executable code. Functions reference undefined objects (llm, vector_db, bm25_search, embed) without imports or setup. The code illustrates patterns but isn't copy-paste ready for any specific framework or library. | 2 / 3 |
Workflow Clarity | Despite covering complex multi-step processes (RAG pipelines, agent loops, production deployment), there are no validation checkpoints, no error recovery steps, and no explicit verification workflows. The RAG pipeline diagram shows a sequence but lacks any guidance on validating retrieval quality, checking embedding correctness, or testing the pipeline end-to-end. | 1 / 3 |
Progressive Disclosure | This is a monolithic wall of text with all content inline. Five major sections with extensive code examples are crammed into a single file with no references to separate detailed documents. The Resources section links to external projects but doesn't split the skill's own content across files for better navigation. | 1 / 3 |
Total | 5 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (761 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents