OWASP LLM Top 10 (2025) audit checklist for AI applications, agent tools, RAG pipelines, and prompt construction. Use when performing any security review touching LLM client code, prompt templates, agent tools, or vector stores.
88
87%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a well-structured description with a clear 'Use when' clause and strong trigger terms covering the LLM security audit domain. Its main weakness is that it describes the skill as a 'checklist' without enumerating the specific actions or checks it performs (e.g., detecting prompt injection, checking for data poisoning, validating output handling). The domain specificity and explicit trigger guidance make it effective for skill selection despite the moderate action specificity.
Suggestions
Add 2-3 concrete actions the checklist performs, e.g., 'Checks for prompt injection vulnerabilities, validates output handling, reviews data poisoning risks, and audits sensitive information disclosure in RAG pipelines.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (OWASP LLM Top 10 security audit) and mentions specific areas like AI applications, agent tools, RAG pipelines, and prompt construction, but doesn't list concrete actions beyond 'audit checklist' — it doesn't specify what the skill actually does (e.g., 'checks for prompt injection vulnerabilities, validates input sanitization, reviews retrieval pipelines for data leakage'). | 2 / 3 |
Completeness | Clearly answers both 'what' (OWASP LLM Top 10 audit checklist for AI applications, agent tools, RAG pipelines, prompt construction) and 'when' (explicit 'Use when performing any security review touching LLM client code, prompt templates, agent tools, or vector stores'). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms that users would actually say: 'security review', 'LLM', 'prompt templates', 'agent tools', 'vector stores', 'RAG pipelines', 'OWASP'. These cover a good range of terms a user performing security work on LLM applications would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — the combination of OWASP LLM Top 10, security audit, and the specific scope (LLM client code, prompt templates, agent tools, vector stores) creates a clear niche that is unlikely to conflict with general coding, security, or document skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
85%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted security checklist skill that is concise, well-organized, and appropriately structured with progressive disclosure. Its main weakness is the lack of concrete code examples showing vulnerable vs. secure patterns, which would make it more actionable for Claude when actually reviewing code. The workflow prioritization (LLM01 → LLM06 → rest) and severity marking system are strong.
Suggestions
Add 1-2 concrete code examples showing a vulnerable pattern and its fix (e.g., string-concatenated prompt injection vs. separate user turn, or an uncapped agent loop vs. one with a depth limit).
Include a brief example of what a completed checklist entry looks like with the ✅/⚠️/🔴 marking system applied to a real finding.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every token earns its place. The table format is maximally dense, anti-patterns are terse, and there's no explanation of what LLMs are or how prompt injection works conceptually — it assumes Claude already knows these things and just needs the checklist. | 3 / 3 |
Actionability | The checklist provides clear detection signals and a marking system (✅/⚠️/🔴), but lacks concrete code examples showing what vulnerable vs. secure patterns look like (e.g., a before/after for prompt injection, or a specific code snippet for max_tokens enforcement). The guidance is specific enough to direct attention but not fully executable. | 2 / 3 |
Workflow Clarity | The workflow is clearly sequenced: check LLM01 first, then LLM06, then mark each item with a status. The scoring impact of P0 findings is explicit ('caps Security score at 40/100'), and the instruction 'not skip any item' serves as a completeness checkpoint. For a checklist-style skill, this is well-structured. | 3 / 3 |
Progressive Disclosure | The SKILL.md serves as a concise overview with a single, clearly signaled reference to 'references/owasp-llm.md' for full detection signals. This is exactly one level deep with clear navigation, and the inline content is appropriately scoped as a summary. | 3 / 3 |
Total | 11 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
Total | 9 / 11 Passed | |
4c72e76
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.