"A comprehensive verification system for Claude Code sessions."
50
50%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
0%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is critically underspecified. It provides no concrete actions, no natural trigger terms, no 'when to use' guidance, and is so generic that it would be nearly impossible for Claude to correctly select this skill from a pool of alternatives. It reads more like a vague tagline than a functional skill description.
Suggestions
List specific concrete actions the skill performs, e.g., 'Runs automated tests, validates outputs, checks for regressions in Claude Code session results.'
Add an explicit 'Use when...' clause with natural trigger terms, e.g., 'Use when the user asks to verify, validate, test, or check the correctness of a Claude Code session.'
Clarify what 'verification' means in this context — what is being verified, against what criteria, and what the expected outputs are — to distinguish this skill from general testing or debugging skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague, abstract language ('comprehensive verification system') without listing any concrete actions. It does not specify what is being verified, how, or what outputs are produced. | 1 / 3 |
Completeness | The description barely addresses 'what' (a verification system, but for what?) and completely omits 'when' — there is no 'Use when...' clause or any explicit trigger guidance. | 1 / 3 |
Trigger Term Quality | The only potentially relevant terms are 'verification' and 'Claude Code sessions,' which are not natural keywords a user would say. There are no actionable trigger terms like 'test,' 'validate,' 'check,' or specific task-related vocabulary. | 1 / 3 |
Distinctiveness Conflict Risk | The description is extremely generic. 'Verification system for Claude Code sessions' could overlap with testing, linting, code review, debugging, or any quality-assurance-related skill, making it highly conflict-prone. | 1 / 3 |
Total | 4 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable verification workflow with clear sequential phases and concrete commands. Its main weaknesses are some unnecessary explanatory sections (When to Use, Continuous Mode, Integration with Hooks) that don't add much value, and the content could benefit from splitting language-specific commands into separate references. The core verification loop is well-structured with appropriate stop-gates.
Suggestions
Remove or significantly trim the 'When to Use', 'Continuous Mode', and 'Integration with Hooks' sections — they explain things Claude can infer and add little actionable value.
Consider splitting language-specific commands (JS/TS vs Python) into separate referenced files to reduce noise when working in a single-language project.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Generally efficient but includes some unnecessary sections like 'When to Use' (Claude knows when to verify) and the 'Continuous Mode' section which is vague. The 'Integration with Hooks' section adds little value. The core phases are reasonably lean though. | 2 / 3 |
Actionability | Each phase provides concrete, executable bash commands with specific tools and flags. The output format template is copy-paste ready, and the commands include practical touches like piping to tail/head for manageable output. | 3 / 3 |
Workflow Clarity | Clear sequential phases with explicit stop-gates ('If build fails, STOP and fix before continuing'). The workflow progresses logically from build → types → lint → tests → security → diff review, with validation checkpoints and a structured output report that serves as a final checklist. | 3 / 3 |
Progressive Disclosure | Content is all inline in a single file, which is borderline acceptable given the length (~90 lines of content). However, the security scan patterns, language-specific commands, and output format template could be split into referenced files for better organization. The structure within the file is good with clear headers. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
Reviewed
Table of Contents