Audit, upgrade, and maintain Grove test suites. Use when the user asks to "audit the test suite", "find untested examples", "upgrade dependencies", "check suite health", "find dead code", "clean up the test suite", "maintain Grove", "what examples are missing tests", or wants to analyze and improve the overall health of a Grove test suite.
89
88%
Does it follow best practices?
Impact
86%
0.97xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly defines its scope (Grove test suite auditing and maintenance), provides extensive natural trigger terms users would actually say, and explicitly separates 'what' from 'when'. The description is concise, uses third person voice, and is highly distinctive due to the Grove-specific domain focus.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Audit, upgrade, and maintain Grove test suites' covers auditing, upgrading dependencies, and maintaining. The trigger phrases further elaborate specific capabilities like finding untested examples, checking suite health, finding dead code, and cleaning up. | 3 / 3 |
Completeness | Clearly answers both 'what' (audit, upgrade, and maintain Grove test suites, analyze and improve overall health) and 'when' with an explicit 'Use when...' clause containing numerous specific trigger phrases. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would say: 'audit the test suite', 'find untested examples', 'upgrade dependencies', 'check suite health', 'find dead code', 'clean up the test suite', 'maintain Grove', 'what examples are missing tests'. These are natural phrases a user would actually type. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive due to the specific 'Grove' domain qualifier and the focused niche of test suite auditing/maintenance. The combination of 'Grove' + test suite health/audit/maintenance creates a clear, non-conflicting niche that is unlikely to overlap with general testing or general code maintenance skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, highly actionable skill that provides clear workflows for three distinct maintenance modes. Its greatest strength is the specificity of guidance — concrete commands, grep patterns, decision trees, and structured output templates for every step. The main weakness is its length; at ~350 lines with no bundle files, it could benefit from extracting reference tables and per-language details into supporting files, and trimming some explanatory prose that over-justifies decisions to Claude.
Suggestions
Extract the language reference table, release notes locations table, and anti-pattern grep patterns into separate reference files (e.g., `references/language-map.md`, `references/release-notes.md`) to reduce the main skill's token footprint.
Trim explanatory justifications aimed at Claude (e.g., why `nc -zv` is insufficient, what transitive dependencies are, why baseline matters) — state the rule directly without the rationale.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is thorough and mostly efficient for its complexity, but some sections are verbose — e.g., the Upgrade Mode Step 0 preflight section explains at length why `nc -zv` is insufficient, and Step 2 includes a large table of release notes locations that could be a separate reference file. Some guidance (like explaining what transitive dependencies are) assumes less than Claude's competence. | 2 / 3 |
Actionability | Highly actionable throughout: concrete grep patterns, exact CLI commands per language, specific file paths, structured output templates, and precise decision trees (e.g., 1-3 failures vs 4+ failures). The language reference table with export/import patterns is immediately executable. | 3 / 3 |
Workflow Clarity | All three modes have clearly numbered, sequenced steps with explicit validation checkpoints. Upgrade mode includes a preflight baseline, runtime regression heuristic (5× baseline), explicit approval gates before applying changes, and clear branching logic for different failure counts. Cleanup mode requires user approval before executing any action. Audit mode builds a coverage map methodically before reporting. | 3 / 3 |
Progressive Disclosure | The skill is a single large file (~350 lines) with no bundle files to offload detail into. The language reference tables, release notes locations, and anti-pattern grep patterns could be split into separate reference files. However, the content is well-organized with clear section headers and mode separation, and it does reference external files like CLAUDE.md and convention files appropriately. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
5985af5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.