Drives development with tests. Use when implementing any logic, fixing any bug, or changing any behavior. Use when you need to prove that code works, when a bug report arrives, or when you're about to modify existing functionality.
56
63%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/test-driven-development/SKILL.mdQuality
Discovery
49%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description has strong completeness with explicit 'Use when' clauses, but suffers from being overly broad in its triggers and vague in its capabilities. The scope ('any logic', 'any bug', 'any behavior') makes it a catch-all that would conflict with most development skills, and the 'what' portion lacks concrete actions like writing specific test types or using particular testing patterns.
Suggestions
Narrow the trigger conditions to be more specific, e.g., 'Use when the user asks to write tests, add test coverage, or practice TDD' rather than 'any logic, any bug, any behavior'.
Add concrete actions to the capability description, e.g., 'Writes unit tests, integration tests, and regression tests using TDD methodology. Generates test cases, assertions, and mocks.'
Include natural trigger terms users would say, such as 'TDD', 'unit test', 'test coverage', 'spec', 'test suite', 'regression test'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description uses vague language like 'drives development with tests' without listing concrete actions. It doesn't specify what kind of tests, what frameworks, or what specific operations (e.g., write unit tests, run test suites, generate test fixtures). 'Drives development' is abstract. | 1 / 3 |
Completeness | The description answers both 'what' (drives development with tests) and 'when' with explicit triggers ('Use when implementing any logic, fixing any bug, or changing any behavior. Use when you need to prove that code works, when a bug report arrives, or when you're about to modify existing functionality'). | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'tests', 'bug', 'logic', 'fixing', and 'behavior' that users might naturally say. However, it misses common variations like 'unit test', 'TDD', 'test-driven', 'test coverage', 'assertion', 'spec', or specific framework names. | 2 / 3 |
Distinctiveness Conflict Risk | The triggers are extremely broad — 'implementing any logic', 'fixing any bug', 'changing any behavior' would match virtually every coding task. This would conflict with nearly any development-related skill and trigger far too often. | 1 / 3 |
Total | 7 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a comprehensive, well-structured TDD skill with excellent actionability — concrete TypeScript examples, clear workflows, and explicit verification steps. Its main weakness is verbosity: it includes motivational content, philosophical justifications, and explanations of concepts Claude already understands (test pyramid, why TDD matters, common rationalizations). The content would benefit from trimming ~30-40% of explanatory prose and moving the Browser Testing and anti-patterns sections into referenced files.
Suggestions
Remove or drastically shorten the 'Common Rationalizations' table and 'Red Flags' section — these are motivational rather than actionable, and Claude doesn't need convincing about TDD's value.
Move the 'Browser Testing with DevTools' section into its own referenced file since it's a distinct concern from core TDD workflow and adds significant length.
Trim the Test Pyramid section — Claude knows what unit/integration/E2E tests are. Keep only the decision guide and resource model table.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is well-written but verbose for its audience. Sections like 'Common Rationalizations', the test pyramid explanation, the Beyoncé Rule, and general TDD philosophy are things Claude already knows. The ASCII diagrams are nice but consume tokens. The core actionable content could be ~40% shorter. | 2 / 3 |
Actionability | Excellent executable TypeScript examples throughout — the TDD cycle, Prove-It Pattern, Arrange-Act-Assert, good vs bad test examples are all concrete and copy-paste ready. The verification checklist and decision guide provide specific, actionable guidance. | 3 / 3 |
Workflow Clarity | The TDD cycle (RED → GREEN → REFACTOR) and Prove-It Pattern are clearly sequenced with explicit validation checkpoints. The bug fix workflow includes verification steps (test fails → fix → test passes → full suite). The verification checklist at the end serves as a final checkpoint. | 3 / 3 |
Progressive Disclosure | References to `browser-testing-with-devtools` and `references/testing-patterns.md` are mentioned but no bundle files are provided, making it impossible to verify they exist. The SKILL.md itself is quite long (~300 lines) and could benefit from moving the Browser Testing section and anti-patterns table into separate referenced files. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f17c6e8
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.