grove-run

Run Grove tests and diagnose failures. Use when the user asks to "run the tests", "run my test", "debug this test failure", "why is this test failing", "check if tests pass", or wants to execute and troubleshoot code example tests.

Quality

79%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/grove-run/SKILL.md

Quality

Discovery

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid description with strong trigger term coverage and a clear 'Use when...' clause that makes it easy for Claude to know when to select this skill. Its main weaknesses are moderate specificity in describing what concrete actions the skill performs beyond running and diagnosing, and some potential overlap with other test-related skills due to generic testing trigger phrases.

Suggestions

Add more specific concrete actions to the 'what' portion, e.g., 'Run Grove tests, parse error output, identify failing assertions, and suggest fixes for code example tests.'

Clarify what makes Grove tests distinct from other test frameworks to reduce conflict risk with other testing skills.

Dimension	Reasoning	Score
Specificity	It names the domain ('Grove tests') and mentions running and diagnosing failures, but doesn't list multiple specific concrete actions beyond 'run' and 'diagnose'. It could be more specific about what diagnosing entails (e.g., parsing error output, suggesting fixes, identifying assertion mismatches).	2 / 3
Completeness	Clearly answers both 'what' (run Grove tests and diagnose failures) and 'when' (explicit 'Use when...' clause with multiple trigger phrases). The when clause is detailed and actionable.	3 / 3
Trigger Term Quality	Excellent coverage of natural trigger terms: 'run the tests', 'run my test', 'debug this test failure', 'why is this test failing', 'check if tests pass', 'execute and troubleshoot code example tests'. These are phrases users would naturally say.	3 / 3
Distinctiveness Conflict Risk	The term 'Grove tests' provides some specificity, but generic phrases like 'run the tests' and 'debug this test failure' could overlap with other testing-related skills (e.g., unit test runners, integration test skills). The niche is somewhat clear but not fully distinct.	2 / 3
	Total	10 / 12 Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill for running and debugging Grove tests. Its greatest strengths are the clear multi-step workflow with validation checkpoints, concrete diagnostic patterns with specific symptoms and fixes, and thorough edge case coverage. The main weakness is verbosity — particularly the extensive Step 0 handoff validation logic and inline failure categories that make the file long, though most content earns its place.

Suggestions

Consider moving the detailed failure category diagnostics (Connection Error, Output Mismatch, etc.) into a separate reference file like GROVE-DIAGNOSTICS.md to reduce the main skill's token footprint while preserving discoverability.

Tighten Step 0 by reducing the example error messages to templates rather than full prose — Claude can generate appropriate user-facing messages from a brief pattern.

Dimension	Reasoning	Score
Conciseness	The skill is fairly long (~200+ lines) with some sections that could be tightened — the handoff file validation in Step 0 is quite verbose with extensive error message templates, and the failure classification categories, while useful, include some explanatory text Claude could infer. However, most content is genuinely instructive and not redundant.	2 / 3
Actionability	The skill provides concrete, executable commands (npm test, pytest, go test, etc.), specific diagnostic patterns with exact symptoms and fixes, and clear code examples like Jest timeout syntax and Expect chain modifiers. The failure categories include specific error messages to match against and precise remediation steps.	3 / 3
Workflow Clarity	The 6-step workflow is clearly sequenced with explicit validation checkpoints: Step 0 validates handoff files with specific checks, Step 3 captures output, Step 4 parses and classifies failures, Step 5 includes re-running tests after fixes to confirm, and the edge cases section covers error recovery scenarios. The feedback loop of diagnose → fix → re-run is explicit.	3 / 3
Progressive Disclosure	The skill references external files (language-specific CLAUDE.md, conventions files with 'Comparison API' sections) which is good progressive disclosure, but the main file itself is quite long with all failure categories inline. The failure diagnosis categories (Connection Error, Output Mismatch, etc.) could potentially be split into a reference file, though they are central enough to justify inclusion. No bundle files are provided to verify referenced paths.	2 / 3
	Total	10 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: mongodb/docs
Commit: 5985af5

Reviewed: 3 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.