Strategy-first testing for the developer inner loop. Derive tests from acceptance criteria, write at the right layer, run and interpret, debug flakiness and ordering issues. Match the conventions already in the codebase — your tests should look like they belong.
48
52%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./develop/skills/test/SKILL.mdQuality
Discovery
42%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description demonstrates strong specificity with multiple concrete actions related to test writing and debugging, and it conveys a clear philosophy ('strategy-first', 'match conventions'). However, it completely lacks a 'Use when...' clause, which significantly hurts completeness and makes it harder for Claude to know when to select this skill. The trigger terms are somewhat jargon-heavy and miss common natural language variations users would employ.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks to write tests, add test coverage, create unit/integration tests, or debug failing/flaky tests.'
Include more natural trigger terms users would say, such as 'unit test', 'integration test', 'test coverage', 'TDD', 'test-driven', 'pytest', 'jest', 'spec', or 'test suite'.
Consider replacing jargon like 'developer inner loop' with more universally understood phrasing to improve discoverability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'Derive tests from acceptance criteria', 'write at the right layer', 'run and interpret', 'debug flakiness and ordering issues', 'Match the conventions already in the codebase'. These are concrete, actionable capabilities. | 3 / 3 |
Completeness | Describes what the skill does reasonably well, but there is no explicit 'Use when...' clause or equivalent trigger guidance. Per the rubric, a missing 'Use when...' clause should cap completeness at 2, and the 'when' is not even implied clearly enough to warrant a 2 — it's entirely absent. | 1 / 3 |
Trigger Term Quality | Contains some relevant keywords like 'tests', 'acceptance criteria', 'flakiness', 'debug', and 'inner loop', but misses common user terms like 'unit test', 'integration test', 'test coverage', 'TDD', 'pytest', 'jest', or file extensions. 'Strategy-first testing for the developer inner loop' uses somewhat jargon-heavy phrasing. | 2 / 3 |
Distinctiveness Conflict Risk | The focus on testing is a recognizable niche, and details like 'acceptance criteria', 'flakiness', and 'ordering issues' help distinguish it. However, without explicit trigger conditions, it could overlap with general coding or debugging skills. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured, thoughtfully written testing skill that excels at workflow clarity and strategic framing. Its main weakness is the lack of concrete, executable examples — the body reads as a philosophy-of-testing guide rather than an actionable reference, with all specifics deferred to reference files that aren't available in the bundle. The writing is clean but could be tighter in places where stance-setting and failure modes repeat earlier points.
Suggestions
Add at least one concrete, executable test example (e.g., a sample AC → derived test case with actual code) in the Plan or Write section so the SKILL.md is actionable on its own without requiring reference files.
Include a minimal example of the plan output format (e.g., a table or checklist showing AC → test case → layer → status markers) to make the planning step copy-paste ready.
Trim the 'Your Stance' and 'Failure Modes' sections — several points (e.g., 'don't guess ACs', 'strategy before writing') are stated in both places; consolidate to reduce redundancy.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is mostly efficient and well-written, but includes some philosophical framing ('Your sharpest move...', 'A test that passes forever regardless of code changes is noise — not signal') and stance-setting that, while valuable for tone, adds tokens beyond what's strictly necessary for actionable guidance. The failure modes section restates ideas already covered earlier. | 2 / 3 |
Actionability | The skill provides a clear conceptual framework (four moves, plan from ACs, layer selection) but lacks concrete executable examples — no code snippets, no specific commands, no example test output. It describes what to do at a strategic level but delegates all concrete guidance to reference files that aren't provided in the bundle. | 2 / 3 |
Workflow Clarity | The four-move workflow (Plan → Write → Run → Debug) is clearly sequenced with explicit validation checkpoints: verify the test fails for the right reason before implementing, run broader suite to check regressions, interpret failures honestly with a decision tree (code wrong vs test wrong vs intermittent). The feedback loops (gap → refinement, flake → debug mode) are well-defined. | 3 / 3 |
Progressive Disclosure | The skill references four sub-files (references/plan.md, references/write.md, references/run.md, references/debug.md) plus two artifact files, which is good structure. However, no bundle files were provided, so we can't verify these references exist or contain useful content. The main file delegates nearly all concrete/actionable detail to these references, making the SKILL.md itself more of a table of contents than a self-contained overview with actionable quick-start content. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
632c389
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.