Use this skill when the user asks to "run tests", "test this", "check if tests pass", "cargo test", "run clippy", "lint this", "check formatting", "cargo fmt", "CI checks", "verify changes", "does this pass tests", "run the full check", "pre-commit check", or wants to verify that code changes are correct. Use this even when the user says something like "make sure this works" or "check for issues" in the context of code changes.
72
68%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/run-tests/SKILL.mdQuality
Discovery
37%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is heavily lopsided: it provides excellent trigger term coverage for when to use the skill, but completely omits what the skill actually does. A reader cannot tell what concrete actions this skill performs (e.g., running specific commands, checking specific outputs). The description reads as a pure trigger-matching list rather than a functional description.
Suggestions
Add a clear 'what it does' statement at the beginning, e.g., 'Runs Rust CI checks including cargo test, cargo clippy, and cargo fmt to verify code correctness, lint issues, and formatting compliance.'
Trim the trigger list to the most distinctive terms and consolidate into a concise 'Use when...' clause rather than listing every possible phrase verbatim.
Rewrite in third person voice describing capabilities (e.g., 'Executes Rust test suites and linting checks') rather than the current imperative 'Use this skill when...' framing that omits the skill's actual functionality.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description lists no concrete actions or capabilities — it never says what the skill actually does (e.g., 'runs cargo test, clippy, and cargo fmt'). It only describes when to use it, not what it does. | 1 / 3 |
Completeness | While the 'when' is thoroughly covered with extensive trigger phrases, the 'what' is entirely missing — the description never explains what the skill actually does (what commands it runs, what outputs it produces, what tools it uses). This is a critical gap that makes it incomplete. | 1 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'run tests', 'cargo test', 'run clippy', 'lint this', 'check formatting', 'cargo fmt', 'CI checks', 'pre-commit check', and even softer phrases like 'make sure this works' and 'check for issues'. | 3 / 3 |
Distinctiveness Conflict Risk | The Rust-specific terms like 'cargo test', 'cargo fmt', and 'clippy' help distinguish it, but the broader phrases like 'check for issues' and 'make sure this works' are very generic and could easily conflict with other code quality or debugging skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
100%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is an excellent skill that is lean, actionable, and well-structured. It provides project-specific knowledge (exact CI commands, serde debugging tips, integration test caveats) without explaining anything Claude already knows. The decision tree for what to run after different types of changes is particularly valuable.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section earns its place. No explanation of what cargo is, what tests are, or how Rust works. The quick reference table is dense and useful. The 'Interpreting Failures' section provides project-specific knowledge Claude wouldn't have. | 3 / 3 |
Actionability | All commands are copy-paste ready with exact flags (e.g., `cargo clippy --locked -- -D warnings`). The full CI check command is directly executable. Specific test targeting examples use real module names from the project. | 3 / 3 |
Workflow Clarity | The 'Full CI Check' section provides the exact CI sequence with clear ordering via `&&` chaining. 'What to Run After a Change' provides a decision tree based on what changed. The 'Interpreting Failures' section serves as a feedback loop for error recovery with specific remediation steps for each failure type. | 3 / 3 |
Progressive Disclosure | For a skill of this scope (single-purpose, under 100 lines, no bundle), the content is well-organized into logical sections that progress from quick reference → full check → targeted checks → troubleshooting. The cross-reference to the `add-command` skill for writing tests is appropriate scoping. | 3 / 3 |
Total | 12 / 12 Passed |
Validation
72%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 8 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 8 / 11 Passed | |
defdc4d
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.