When the user needs a security assessment — threat modeling, vulnerability review, auth flow audit, dependency scanning, or says "is this secure", "review for vulnerabilities", "threat model", "security audit", "pen test prep".
83
80%
Does it follow best practices?
Impact
86%
1.30xAverage score across 3 eval scenarios
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/security-review/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description excels at trigger term coverage and distinctiveness, providing excellent natural-language phrases users would say when needing security assessments. However, it leans heavily toward describing 'when' to use the skill while underspecifying 'what' the skill actually does — it lists assessment types but doesn't describe concrete outputs or actions (e.g., 'generates threat models', 'produces vulnerability reports'). The structure is inverted from the ideal pattern of 'what it does' followed by 'use when'.
Suggestions
Add explicit 'what' actions describing outputs, e.g., 'Performs security assessments including generating threat models, identifying vulnerabilities, auditing authentication flows, and scanning dependencies for known CVEs.'
Restructure to lead with capabilities first, then trigger conditions: 'Generates threat models, identifies vulnerabilities, audits auth flows, scans dependencies. Use when...'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: threat modeling, vulnerability review, auth flow audit, dependency scanning. These are distinct, well-defined security assessment activities. | 3 / 3 |
Completeness | The 'when' is very well covered with explicit trigger phrases, but the 'what' (what the skill actually does/produces) is only implied through the list of assessment types. It doesn't clearly state what actions the skill performs or what output it generates — it describes when to use it more than what it does. | 2 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'is this secure', 'review for vulnerabilities', 'threat model', 'security audit', 'pen test prep'. These are highly natural phrases a user would actually type. | 3 / 3 |
Distinctiveness Conflict Risk | Security assessment is a clear niche with distinct trigger terms like 'threat model', 'security audit', 'pen test prep', and 'vulnerability review'. These are unlikely to conflict with general code review or other development skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a comprehensive and highly actionable security review skill with a well-defined five-phase workflow and explicit validation checkpoints. Its main weakness is length — the inline reference material (OWASP Top 10, STRIDE explanations, CVSS guide) inflates the token cost significantly, and much of this is knowledge Claude already possesses. The skill would benefit from extracting reference checklists into separate files while keeping the workflow and output format in the main SKILL.md.
Suggestions
Extract the OWASP Top 10 checks, STRIDE details, CVSS scoring guide, and Auth Flow Checklist into separate reference files (e.g., OWASP_CHECKS.md, AUTH_CHECKLIST.md) and link to them from the main skill to reduce token cost.
Trim explanations of well-known concepts (e.g., what each STRIDE category means, what OWASP Top 10 items are) to just the specific, non-obvious checks and thresholds that add value beyond Claude's existing knowledge.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is generally well-structured but includes some content Claude already knows (OWASP Top 10 descriptions, basic CVSS definitions, what STRIDE categories mean). The checklists and framework sections could be more concise, as Claude is familiar with these security concepts. However, the specific tool commands and thresholds add genuine value. | 2 / 3 |
Actionability | Provides specific, executable commands (semgrep, npm audit, pip-audit, trivy, govulncheck), concrete code examples in the output snippet, specific configuration values (argon2id cost >= 10, 15-min access tokens, 5 attempts/15 min rate limiting), and a complete output template. The example with JWT findings is copy-paste ready with file:line references. | 3 / 3 |
Workflow Clarity | The five-phase workflow is clearly sequenced with explicit ordering constraints (automated before manual, authorization before active testing). Phase 4 serves as a validation checkpoint (eliminating false positives, contextual assessment). The mandatory constraints section reinforces the feedback loop and safety gates. | 3 / 3 |
Progressive Disclosure | The skill references related skills (code-review, architecture-design, soc2-prep) for chaining, which is good. However, the OWASP Top 10, STRIDE details, Auth Flow Checklist, and CVSS scoring guide are all inline, making this a lengthy monolithic document. These reference sections could be split into separate files with clear pointers. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
4ad31b4
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.