improve-test-coverage

Improve test coverage for shell features and commands using reference test suites from yash, GNU coreutils, and uutils/coreutils

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The content is a highly actionable, clearly sequenced multi-phase workflow with strong validation checkpoints and feedback loops. Its weaknesses are verbosity from process-coaching asides and a monolithic structure that keeps reference material inline rather than splitting it into bundle files.

Suggestions

Trim the meta process-coaching (the 'STOP — READ THIS' preamble and repeated 'do not stop voluntarily' injunctions) to the essential rule, cutting tokens that do not advance the workflow.

Move the stable reference material — the YAML scenario format spec, the gap-category tables, and the bash-comparison CLI mirror table — into a references/ bundle file linked from the main body, so SKILL.md stays a lean overview and progressive disclosure improves.

Dimension	Reasoning	Score
Conciseness	The ~785-line body is mostly efficient operational detail (commands, paths, templates), but it is padded with process-coaching blocks like the 'STOP — READ THIS' section and repeated anti-stop injunctions ('don't stop because you feel low on context budget'), so it sits at 'mostly efficient but could be tightened' rather than the lean level-3 anchor.	2 / 3
Actionability	It provides concrete, copy-paste-ready guidance throughout: exact bash/grep commands, file paths, a YAML scenario template, commit-message templates, and a CLI-flag mirror table for bash comparison, matching the 'fully executable code/commands; copy-paste ready' anchor.	3 / 3
Workflow Clarity	A clear three-phase (A/B/C), 14-step sequence is laid out with explicit validation checkpoints (re-read sub-bullets before marking completed, Step 10 test run, Step 12 fix-ci-tests + 'gh pr checks' confirmation) and feedback loops (diff bash vs our shell, re-validate), matching the level-3 anchor and not the level-2 anchor whose checkpoints are merely implicit.	3 / 3
Progressive Disclosure	No bundle files exist (references/scripts/assets are all absent) and the body references no offloaded reference files; everything — gap taxonomy tables, the YAML format spec, CLI mirror table — lives inline in a single ~785-line document, so it lands at 'some structure but content that should be separate is inline' rather than the well-split level-3 anchor.	2 / 3
	Total	10 / 12 Passed

Description

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific and distinctive, clearly scoping a narrow test-coverage niche with named reference suites. Its main weakness is the absence of an explicit 'Use when...' trigger clause, which leaves the when-to-invoke guidance only implied and caps completeness.

Suggestions

Add an explicit 'Use when...' clause naming the natural trigger phrases (e.g. 'Use when improving test coverage for shell builtins or language features, or when the user asks to find missing test cases').

Include common user-facing variations of the trigger terms (e.g. 'test coverage', 'missing tests', 'coverage gaps', 'shell command tests') so the description matches how a user would naturally phrase the request.

Dimension	Reasoning	Score
Specificity	Names a concrete domain ('test coverage for shell features and commands') and multiple specific actions/targets ('mining reference test suites from yash, GNU coreutils, and uutils/coreutils'), matching the 'lists multiple specific concrete actions' anchor rather than the level-2 anchor which omits the named suites.	3 / 3
Completeness	The 'what' is clearly stated (improve test coverage using reference suites) but the 'when' is only implied with no 'Use when...' clause, which per the judging guidelines caps completeness at 2 rather than reaching the explicit-trigger level 3.	2 / 3
Trigger Term Quality	Contains relevant natural terms ('test coverage', 'shell features', 'commands', 'reference test suites') but lacks common variations a user would say and offers no explicit trigger phrasing, so it lands at 'some relevant keywords but missing common variations' rather than the fuller level-3 coverage.	2 / 3
Distinctiveness Conflict Risk	The niche is narrow and concrete (shell test coverage via the specific yash/GNU/uutils suites), making it clearly distinguishable and unlikely to trigger for the wrong skill, matching the 'clear niche with distinct triggers' anchor.	3 / 3
	Total	10 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 13 / 16 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (789 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
referenced_paths_exist	Referenced path issues: 1 missing	Warning

	Total	13 / 16 Passed

Repository: DataDog/rshell
Commit: a0a1140

Reviewed: about 16 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.