Use when implementing any feature or bugfix, before writing implementation code - write the test first, watch it fail, write minimal code to pass; ensures tests actually verify behavior by requiring failure first
68
61%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/tdd/skills/test-driven-development/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description effectively communicates the TDD workflow and includes an explicit 'Use when' clause, which is a strength. However, it lacks key trigger terms users would naturally use (like 'TDD' or 'test-driven development') and its broad trigger conditions ('any feature or bugfix') create overlap risk with other implementation-focused skills. The description reads more as a process instruction than a capability description.
Suggestions
Add explicit trigger terms users would naturally say, such as 'TDD', 'test-driven development', 'red-green-refactor', 'unit test first', and 'write tests before code'.
Narrow the trigger scope to reduce conflict risk - instead of 'any feature or bugfix', specify 'when the user requests TDD, test-first development, or asks to write tests before implementation'.
List more concrete actions the skill enables, such as 'Generates failing test cases, validates test failure, implements minimal passing code, and refactors while keeping tests green'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It describes a methodology (write test first, watch it fail, write minimal code to pass) which names the domain (TDD) and some actions, but doesn't list multiple concrete specific actions beyond the general TDD cycle. The actions are more process-oriented than capability-oriented. | 2 / 3 |
Completeness | The description explicitly answers both 'what' (write the test first, watch it fail, write minimal code to pass, ensures tests verify behavior) and 'when' (when implementing any feature or bugfix, before writing implementation code) with a clear 'Use when' clause at the beginning. | 3 / 3 |
Trigger Term Quality | Includes some relevant terms like 'test first', 'feature', 'bugfix', 'implementation code', but misses common natural trigger terms users would say such as 'TDD', 'test-driven development', 'red-green-refactor', 'unit test', or 'write tests'. | 2 / 3 |
Distinctiveness Conflict Risk | The description is somewhat specific to TDD methodology, but terms like 'feature', 'bugfix', and 'implementation code' are very broad and could overlap with general coding skills, code review skills, or testing skills that aren't specifically TDD-focused. | 2 / 3 |
Total | 9 / 12 Passed |
Implementation
55%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides excellent actionable guidance with clear workflow steps, verification checkpoints, and executable code examples. However, it is severely bloated—the same core messages are repeated across multiple sections (rationalizations table, 'Why Order Matters' prose, red flags list), and the anti-patterns content should be a separate file. The persuasive/motivational tone ('Thinking about skipping TDD? Stop. That's rationalization.') wastes tokens on things Claude doesn't need to be convinced of.
Suggestions
Cut the content by ~50%: merge the 'Why Order Matters' prose section and 'Common Rationalizations' table into a single concise table, and remove the 'Red Flags' list which largely duplicates both.
Split the Testing Anti-Patterns into a separate ANTI_PATTERNS.md file referenced from the main skill, reducing the monolithic structure and improving progressive disclosure.
Remove persuasive/motivational language ('Stop. That's rationalization.', 'Sunk cost fallacy.') — Claude doesn't need to be convinced, it needs instructions.
Eliminate the dot graph definition which is not renderable in most contexts and replace with a simple numbered list or ASCII flow that's already present in the step-by-step sections.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is extremely verbose at ~400+ lines. It repeats the same points multiple times (e.g., 'delete code and start over' appears in at least 5 places, the rationalizations table duplicates the 'Why Order Matters' section almost entirely). The anti-patterns section rehashes TDD principles already covered. Claude already understands TDD conceptually—this should focus on the specific workflow rules, not persuasive essays about why TDD matters. | 1 / 3 |
Actionability | The skill provides fully executable TypeScript code examples for both tests and implementations, concrete bash commands for verification steps, and specific good/bad comparisons. The bug fix example walks through a complete TDD cycle with real code. | 3 / 3 |
Workflow Clarity | The Red-Green-Refactor cycle is clearly sequenced with explicit mandatory verification steps at each stage ('Verify RED - Watch It Fail' is marked MANDATORY). Feedback loops are present (wrong failure → fix test, test passes unexpectedly → fix test, other tests fail → fix now). The verification checklist at the end provides a comprehensive gate before completion. | 3 / 3 |
Progressive Disclosure | This is a monolithic wall of text with no bundle files or external references. The TDD skill and Testing Anti-Patterns content are combined into a single massive file when they could easily be split. There's no navigation structure—the anti-patterns section alone could be a separate referenced file, and the rationalizations/common excuses sections are redundant with each other. | 1 / 3 |
Total | 8 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
skill_md_line_count | SKILL.md is long (699 lines); consider splitting into references/ and linking | Warning |
Total | 10 / 11 Passed | |
dedca19
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.