Validate test effectiveness with mutation testing using Stryker (TypeScript/JavaScript with Vitest or bun test via @hughescr/stryker-bun-runner) and mutmut (Python). Find weak tests that pass despite code mutations. Use to improve test quality.
81
73%
Does it follow best practices?
Impact
93%
1.43xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/mutation-testing/skills/mutation-testing/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description with excellent specificity and trigger term coverage for a well-defined niche (mutation testing). The main weakness is the lack of an explicit 'Use when...' clause with concrete trigger scenarios, which would help Claude know exactly when to select this skill. The description uses proper third-person voice throughout.
Suggestions
Replace 'Use to improve test quality' with an explicit 'Use when...' clause, e.g., 'Use when the user asks about mutation testing, wants to evaluate test suite effectiveness, mentions Stryker or mutmut, or wants to find tests that don't catch code changes.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple concrete actions: validate test effectiveness, mutation testing, find weak tests that pass despite code mutations, improve test quality. Names specific tools (Stryker, mutmut) and frameworks (Vitest, bun test, @hughescr/stryker-bun-runner). | 3 / 3 |
Completeness | Clearly answers 'what' (validate test effectiveness with mutation testing, find weak tests) but the 'when' is only implied via 'Use to improve test quality' rather than an explicit 'Use when...' clause with trigger scenarios. Per rubric guidelines, missing explicit 'Use when...' clause caps completeness at 2. | 2 / 3 |
Trigger Term Quality | Includes strong natural keywords users would say: 'mutation testing', 'Stryker', 'mutmut', 'test quality', 'weak tests', 'TypeScript', 'JavaScript', 'Python', 'Vitest', 'bun test'. Good coverage of both tool names and conceptual terms. | 3 / 3 |
Distinctiveness Conflict Risk | Mutation testing is a very specific niche. The mention of specific tools (Stryker, mutmut) and the concept of 'mutations' and 'weak tests' make this highly distinctive and unlikely to conflict with general testing or code quality skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable skill with excellent concrete examples and executable code for both TypeScript and Python mutation testing. Its main weaknesses are moderate verbosity (explaining concepts Claude already knows, like mutation types and score interpretation) and a workflow section that lacks explicit validation/error-recovery checkpoints. The content would benefit from trimming generic knowledge and splitting detailed runner configurations into separate files.
Suggestions
Remove or significantly trim the 'Core Concept', 'Common Mutation Types', and 'Mutation Score Targets' sections — Claude already understands these concepts and they consume tokens without adding actionable value.
Add explicit validation checkpoints to the workflow: what to check after each step, how to diagnose common failures (e.g., config errors, runner not found), and when to retry.
Consider splitting Bun runner configuration details into a separate reference file to keep the main SKILL.md as a concise overview with links to detailed setup guides.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is mostly efficient but includes some unnecessary content like the 'Core Concept' definitions (Claude knows what mutation testing is), the 'Common Mutation Types' section (basic knowledge), and the mutation score targets table which is generic guidance. The Bun runner section is well-detailed but the 'Key Behaviors' explanations are somewhat verbose. | 2 / 3 |
Actionability | Provides fully executable installation commands, complete configuration files, runnable CLI commands, and concrete before/after test examples. Everything is copy-paste ready for both Stryker (Vitest and Bun runners) and mutmut. | 3 / 3 |
Workflow Clarity | The workflow section at the end provides a clear sequence but lacks explicit validation checkpoints and error recovery steps. There's no guidance on what to do if mutation testing fails to run, if configuration is wrong, or how to interpret and act on specific error messages. The numbered steps are present but the feedback loop is implicit rather than explicit. | 2 / 3 |
Progressive Disclosure | The content is reasonably structured with clear sections, and references 'vitest-testing' and 'test-quality-analysis' at the end. However, the file is quite long (~180 lines of content) and could benefit from splitting detailed configuration (especially the Bun runner specifics) into separate reference files. The Common Mutation Types and score targets tables could also be externalized. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
88da5ff
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.