Run Grove tests and diagnose failures. Use when the user asks to "run the tests", "run my test", "debug this test failure", "why is this test failing", "check if tests pass", or wants to execute and troubleshoot code example tests.
66
79%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/grove-run/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description with strong trigger term coverage and a clear 'Use when...' clause that makes it easy for Claude to know when to select this skill. Its main weaknesses are moderate specificity in describing what concrete actions the skill performs beyond running and diagnosing, and some potential overlap with other test-related skills due to generic testing trigger phrases.
Suggestions
Add more specific concrete actions to the 'what' portion, e.g., 'Run Grove tests, parse error output, identify failing assertions, and suggest fixes for code example tests.'
Clarify what makes Grove tests distinct from other test frameworks to reduce conflict risk with other testing skills.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It names the domain ('Grove tests') and mentions running and diagnosing failures, but doesn't list multiple specific concrete actions beyond 'run' and 'diagnose'. It could be more specific about what diagnosing entails (e.g., parsing error output, suggesting fixes, identifying assertion mismatches). | 2 / 3 |
Completeness | Clearly answers both 'what' (run Grove tests and diagnose failures) and 'when' (explicit 'Use when...' clause with multiple trigger phrases). The when clause is detailed and actionable. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms: 'run the tests', 'run my test', 'debug this test failure', 'why is this test failing', 'check if tests pass', 'execute and troubleshoot code example tests'. These are phrases users would naturally say. | 3 / 3 |
Distinctiveness Conflict Risk | The term 'Grove tests' provides some specificity, but generic phrases like 'run the tests' and 'debug this test failure' could overlap with other testing-related skills (e.g., unit test runners, integration test skills). The niche is somewhat clear but not fully distinct. | 2 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, highly actionable skill for running and debugging Grove tests. Its greatest strengths are the clear multi-step workflow with validation checkpoints, concrete diagnostic patterns with specific symptoms and fixes, and thorough edge case coverage. The main weakness is verbosity — particularly the extensive Step 0 handoff validation logic and inline failure categories that make the file long, though most content earns its place.
Suggestions
Consider moving the detailed failure category diagnostics (Connection Error, Output Mismatch, etc.) into a separate reference file like GROVE-DIAGNOSTICS.md to reduce the main skill's token footprint while preserving discoverability.
Tighten Step 0 by reducing the example error messages to templates rather than full prose — Claude can generate appropriate user-facing messages from a brief pattern.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~200+ lines) with some sections that could be tightened — the handoff file validation in Step 0 is quite verbose with extensive error message templates, and the failure classification categories, while useful, include some explanatory text Claude could infer. However, most content is genuinely instructive and not redundant. | 2 / 3 |
Actionability | The skill provides concrete, executable commands (npm test, pytest, go test, etc.), specific diagnostic patterns with exact symptoms and fixes, and clear code examples like Jest timeout syntax and Expect chain modifiers. The failure categories include specific error messages to match against and precise remediation steps. | 3 / 3 |
Workflow Clarity | The 6-step workflow is clearly sequenced with explicit validation checkpoints: Step 0 validates handoff files with specific checks, Step 3 captures output, Step 4 parses and classifies failures, Step 5 includes re-running tests after fixes to confirm, and the edge cases section covers error recovery scenarios. The feedback loop of diagnose → fix → re-run is explicit. | 3 / 3 |
Progressive Disclosure | The skill references external files (language-specific CLAUDE.md, conventions files with 'Comparison API' sections) which is good progressive disclosure, but the main file itself is quite long with all failure categories inline. The failure diagnosis categories (Connection Error, Output Mismatch, etc.) could potentially be split into a reference file, though they are central enough to justify inclusion. No bundle files are provided to verify referenced paths. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
5985af5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.