CtrlK
BlogDocsLog inGet started
Tessl Logo

grove-run

Run Grove tests and diagnose failures. Use when the user asks to "run the tests", "run my test", "debug this test failure", "why is this test failing", "check if tests pass", or wants to execute and troubleshoot code example tests.

66

Quality

79%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/grove-run/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured, highly actionable skill for running and debugging Grove tests. Its greatest strengths are the clear multi-step workflow with validation checkpoints, concrete diagnostic patterns with specific symptoms and fixes, and thorough edge case coverage. The main weakness is verbosity — particularly the extensive handoff file validation logic in Step 0 and some explanatory text in failure categories that could be more concise.

Suggestions

Consider moving the Step 0 handoff file validation logic into a separate reference file, as it's a secondary path that adds significant length to the main skill.

Tighten the failure category descriptions by removing phrases Claude can infer (e.g., 'Explain which sample database is needed' — Claude knows to do this from the symptom context).

DimensionReasoningScore

Conciseness

The skill is fairly long (~200+ lines) with some sections that could be tightened — the handoff file validation in Step 0 is quite verbose with extensive error message templates, and the failure classification categories, while useful, include some explanatory text Claude could infer. However, most content is genuinely instructive and not redundant.

2 / 3

Actionability

The skill provides concrete, executable commands (npm test, pytest, go test, etc.), specific diagnostic patterns with exact symptoms and fixes, and clear code examples like Jest timeout syntax and Expect chain modifiers. The failure categories include specific error messages to match against and precise remediation steps.

3 / 3

Workflow Clarity

The 6-step workflow is clearly sequenced with explicit validation checkpoints: Step 0 validates handoff files with specific checks, Step 3 captures output, Step 4 parses and classifies failures, Step 5 includes re-running tests after fixes to confirm, and the edge cases section covers error recovery scenarios. The feedback loop of diagnose → fix → re-run is explicit.

3 / 3

Progressive Disclosure

The skill references external files (language-specific CLAUDE.md, conventions files with 'Comparison API' sections) which is good progressive disclosure, but the main file itself is quite long with all failure categories inline. The failure diagnosis categories (Connection Error, Output Mismatch, etc.) could potentially be split into a reference file, though they are central enough to justify inclusion. No bundle files are provided to verify referenced paths.

2 / 3

Total

10

/

12

Passed

Description

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a solid description with strong trigger term coverage and good completeness, explicitly answering both what and when. Its main weaknesses are moderate specificity in describing concrete actions (beyond run/diagnose) and some overlap risk with other test-related skills due to generic trigger phrases like 'run the tests'. Adding more specific capabilities and clarifying what makes Grove tests distinct would strengthen it.

Suggestions

Add more specific concrete actions beyond 'run' and 'diagnose', e.g., 'Executes Grove test suites, parses failure output, identifies root causes, and suggests fixes for code example tests.'

Strengthen distinctiveness by briefly clarifying what Grove tests are or what differentiates them from other test frameworks, e.g., 'Grove documentation code example tests' to reduce conflict with general test-running skills.

DimensionReasoningScore

Specificity

It names the domain ('Grove tests') and mentions running and diagnosing failures, but doesn't list multiple specific concrete actions beyond 'run' and 'diagnose'. It could be more specific about what diagnosing entails (e.g., parsing error output, suggesting fixes, identifying assertion mismatches).

2 / 3

Completeness

Clearly answers both 'what' (run Grove tests and diagnose failures) and 'when' (explicit 'Use when...' clause with multiple trigger scenarios). The when clause is detailed and explicit.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger phrases: 'run the tests', 'run my test', 'debug this test failure', 'why is this test failing', 'check if tests pass', and 'execute and troubleshoot code example tests'. These are phrases users would naturally say.

3 / 3

Distinctiveness Conflict Risk

The term 'Grove tests' provides some specificity, but generic phrases like 'run the tests' and 'debug this test failure' could overlap with other testing-related skills (e.g., unit test runners, integration test skills). The 'Grove' qualifier helps but the trigger phrases are quite broad.

2 / 3

Total

10

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
mongodb/docs
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.