Evaluate the quality and efficacy of existing tests by reviewing test code against source code. Use when the user asks to review tests, validate test quality, audit test suites, check test efficacy, or assess whether tests are testing things properly. Prioritizes real interactions over mocking and simulation.
66
79%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/test-evaluator/skills/test-evaluator/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly communicates both what the skill does and when to use it, with good trigger term coverage. The main weakness is that the specific capabilities could be more granular—listing concrete evaluation actions like checking assertion quality, identifying missing edge cases, or detecting over-mocking would strengthen specificity. The distinctiveness could also be improved by more clearly differentiating from general code review or test-writing skills.
Suggestions
Add more specific concrete actions like 'identifies missing edge cases, evaluates assertion quality, detects over-mocking, checks coverage gaps' to improve specificity.
Add differentiating language to reduce overlap with code review or test-writing skills, e.g., 'Does not write new tests; focuses on evaluating existing test suites.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description names the domain (test evaluation) and some actions ('reviewing test code against source code'), but doesn't list multiple specific concrete actions like identifying missing edge cases, checking assertion quality, evaluating coverage gaps, or detecting redundant tests. | 2 / 3 |
Completeness | Clearly answers both 'what' (evaluate quality and efficacy of existing tests by reviewing test code against source code) and 'when' (explicit 'Use when' clause listing multiple trigger scenarios). Also includes a philosophical note about prioritizing real interactions over mocking. | 3 / 3 |
Trigger Term Quality | Good coverage of natural terms users would say: 'review tests', 'test quality', 'audit test suites', 'test efficacy', 'testing things properly'. These are phrases users would naturally use when seeking this kind of help. | 3 / 3 |
Distinctiveness Conflict Risk | While focused on test evaluation specifically, it could overlap with general code review skills or test-writing skills. The distinction between 'reviewing tests' and 'writing tests' or 'reviewing code' could cause some ambiguity, though the explicit triggers help somewhat. | 2 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that provides clear, actionable guidance for evaluating test quality. Its main strengths are the concrete evaluation workflow, well-defined output format with examples, and the practical anti-patterns reference. Its primary weakness is length — the language-specific guidance and philosophy sections add bulk that could be trimmed or externalized, and some testing principles stated are things Claude would already know.
Suggestions
Trim the 'Core Philosophy' section to a brief numbered list without explanatory sentences — Claude already understands these testing concepts.
Extract the 'Language & Framework Guidance' and 'Anti-Patterns Cheat Sheet' sections into separate reference files (e.g., FRAMEWORK_GUIDANCE.md, ANTI_PATTERNS.md) and link to them from the main skill.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is generally well-written but has some verbosity. The 'Core Philosophy' section explains testing principles Claude likely already knows. The language/framework guidance section is extensive and could be more condensed. However, the anti-patterns table and evaluation criteria are dense and useful. | 2 / 3 |
Actionability | The skill provides highly concrete, actionable guidance: a specific evaluation workflow with clear steps, a defined output format with example tables, specific verdicts with definitions, and concrete examples of good vs bad patterns per framework. The output format is copy-paste ready with markdown table templates. | 3 / 3 |
Workflow Clarity | The 5-step evaluation workflow is clearly sequenced and logical: read tests → read source → evaluate each test → identify gaps → produce report. Each step has explicit criteria and the output format serves as a natural validation checkpoint ensuring thoroughness. For a review/analysis task (non-destructive), this level of workflow clarity is excellent. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections and headers, but it's a long monolithic file (~180 lines) with no references to external files. The language-specific guidance and anti-patterns cheat sheet could reasonably be split into separate reference files to keep the main skill leaner, though the lack of bundle files means this is the only option currently. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
349d5ed
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.