improve-test-coverage

Improve test coverage for shell features and commands using reference test suites from yash, GNU coreutils, and uutils/coreutils

Quality

47%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/improve-test-coverage/SKILL.md

Quality

Discovery

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear and distinctive niche (shell test coverage using specific reference test suites), which reduces conflict risk. However, it lacks explicit trigger guidance ('Use when...'), lists only one vague action ('improve test coverage'), and misses common natural language terms users might employ when requesting this kind of help.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to add, port, or improve tests for shell builtins, coreutils commands, or POSIX compliance.'

List more specific concrete actions, e.g., 'Ports reference tests from yash/GNU coreutils/uutils test suites, identifies coverage gaps, writes new test cases for shell builtins and commands.'

Include additional natural trigger terms like 'unit tests', 'bash', 'POSIX', 'shell scripts', 'coreutils testing', 'test porting' to improve keyword coverage.

Dimension	Reasoning	Score
Specificity	Names the domain (test coverage for shell features/commands) and references specific test suites (yash, GNU coreutils, uutils/coreutils), but doesn't list concrete actions beyond 'improve test coverage' — e.g., it doesn't specify writing tests, porting tests, analyzing coverage gaps, etc.	2 / 3
Completeness	Describes what it does (improve test coverage using reference test suites) but has no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause caps completeness at 2, and the 'what' itself is also somewhat vague, placing this at 1.	1 / 3
Trigger Term Quality	Includes relevant keywords like 'test coverage', 'shell', 'yash', 'GNU coreutils', 'uutils/coreutils', but misses common user-facing terms like 'unit tests', 'shell scripts', 'bash', 'POSIX', or file extensions. A user asking about shell testing might not use these exact reference suite names.	2 / 3
Distinctiveness Conflict Risk	The combination of shell test coverage with specific reference suites (yash, GNU coreutils, uutils/coreutils) creates a very clear niche that is unlikely to conflict with other skills. This is highly distinctive.	3 / 3
	Total	8 / 12 Passed

Implementation

55%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill is extraordinarily thorough and actionable — every step has concrete commands, decision tables, and clear sequencing with validation checkpoints and crash recovery. However, it is severely over-long and monolithic, cramming ~600+ lines of detailed reference material (gap category tables, layer-selection rubrics, YAML format specs, commit templates) into a single file with no progressive disclosure. The verbosity significantly undermines token efficiency, repeating commands and explaining concepts Claude already understands.

Suggestions

Extract the gap category tables (Step 5), layer-selection rubric (Step 6), and YAML format reference (Step 6) into separate bundle files (e.g., GAP_CATEGORIES.md, LAYER_SELECTION.md, SCENARIO_FORMAT.md) and reference them from the main skill.

Remove redundant explanations of basic operations Claude already knows (git add/commit/push workflows, how to run bash commands, what YAML is) — replace with terse command blocks only.

Consolidate repeated bash command patterns (e.g., the 'find tests/scenarios/...' commands appear in Steps 4, 7, 8, and 9) into a single reference section or bundle file.

Trim the security preamble to 2-3 sentences — Claude understands prompt injection; the detailed examples of injection payloads and the <external-data> framing are unnecessary verbosity.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~600+ lines. Massive amounts of repetition (e.g., the same bash/git commands repeated in multiple steps, the gap categories tables, the layer-selection table). Many instructions are procedural scaffolding that Claude could infer (e.g., 'Before starting step N, call TaskList and verify step N-1 is completed'). The security preamble, while important, is also lengthy. The skill explains concepts Claude already knows (how to use git, how to run bash commands, what YAML is).	1 / 3
Actionability	Highly actionable with concrete, executable bash commands, specific file paths, exact YAML formats with examples, precise git commit message templates, and detailed decision tables for every classification step. Every step has copy-paste-ready commands and clear criteria for decision-making.	3 / 3
Workflow Clarity	Exceptionally clear three-phase workflow (Setup → Per-target loop → Finalization) with explicit step numbering, dependency ordering, validation checkpoints (Step 10 runs tests, Step 12 runs CI fixes), feedback loops (fix and re-validate), and a durable progress tracker (COVERAGE_PROGRESS.md) for crash recovery. The resume protocol is well-defined.	3 / 3
Progressive Disclosure	Monolithic wall of text with no bundle files to offload detail into. The gap category tables, layer-selection rubric, YAML format reference, and per-step bash commands could all be split into separate reference files. Everything is inlined into a single massive document with no external references for detailed content like the YAML format spec or the gap analysis rubric.	1 / 3
	Total	8 / 12 Passed

Validation

81%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 9 / 11 Passed

Validation for skill structure

Criteria	Description	Result
skill_md_line_count	SKILL.md is long (789 lines); consider splitting into references/ and linking	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	9 / 11 Passed

Repository: DataDog/rshell
Commit: 00bdc03

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.