Use when implementing any feature or bugfix, before writing implementation code
44
56%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description is critically weak because it fails to state what the skill actually does. It only provides a vague timing cue ('before writing implementation code') without any concrete actions or capabilities. The extreme generality ('any feature or bugfix') makes it nearly impossible to distinguish from other development-related skills.
Suggestions
Add concrete actions describing what this skill does (e.g., 'Creates implementation plans, identifies edge cases, and outlines test scenarios for features and bugfixes').
Add a clear 'Use when...' clause with specific trigger terms (e.g., 'Use when the user asks to plan a feature, design a solution, or prepare for implementation').
Narrow the scope to reduce conflict risk—specify what distinguishes this from other pre-coding activities like architecture review, requirements gathering, or test planning.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description contains no concrete actions whatsoever. It doesn't describe what the skill does—only vaguely when to use it ('implementing any feature or bugfix, before writing implementation code'). There are no specific capabilities listed. | 1 / 3 |
Completeness | The description answers 'when' partially ('before writing implementation code') but completely fails to answer 'what does this do'. Without knowing what the skill actually does, this is fundamentally incomplete. | 1 / 3 |
Trigger Term Quality | It includes some relevant terms like 'feature', 'bugfix', and 'implementation code' that users might naturally mention, but the coverage is incomplete and the terms are fairly generic. Missing variations like 'bug fix', 'new feature', 'coding', 'development workflow', etc. | 2 / 3 |
Distinctiveness Conflict Risk | The description is extremely generic—'any feature or bugfix' and 'before writing implementation code' could apply to dozens of skills (planning, design, testing, architecture, code review, etc.). There is nothing to distinguish this from other pre-implementation skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill excels at actionability and workflow clarity with concrete, executable examples and explicit verification checkpoints at each TDD phase. However, it is significantly undermined by extreme verbosity and repetition—the philosophical arguments for TDD are restated 3-4 times across different sections (Why Order Matters, Common Rationalizations table, Red Flags list), explaining concepts Claude already understands. Cutting the repetitive motivational content would make this a much stronger skill.
Suggestions
Consolidate 'Why Order Matters,' 'Common Rationalizations,' and 'Red Flags' into a single concise section—currently the same points (delete code, tests-after prove nothing, sunk cost) appear in all three.
Remove explanations of concepts Claude already knows (sunk cost fallacy, manual vs automated testing advantages, what technical debt is) and replace with terse directives.
Move the rationalizations/philosophy content to a separate reference file (e.g., TDD-RATIONALE.md) and keep SKILL.md focused on the actionable workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose and repetitive. The same points about 'delete code and start over,' 'tests after prove nothing,' and 'sunk cost fallacy' are restated across multiple sections (Why Order Matters, Common Rationalizations, Red Flags). The 'Why Order Matters' section explains concepts Claude already understands (sunk cost fallacy, manual vs automated testing). The rationalizations table largely duplicates the prose above it. This could be cut by 50%+ without losing actionable content. | 1 / 3 |
Actionability | Provides fully executable TypeScript code examples for each TDD phase (RED, GREEN, REFACTOR), concrete bash commands for verification, a complete bug fix walkthrough, and specific good/bad comparisons. The guidance is copy-paste ready and leaves no ambiguity about what to do. | 3 / 3 |
Workflow Clarity | The Red-Green-Refactor cycle is clearly sequenced with explicit mandatory verification checkpoints after both RED and GREEN phases. Includes feedback loops (test errors → fix → re-run; test passes unexpectedly → fix test) and a comprehensive verification checklist before marking work complete. | 3 / 3 |
Progressive Disclosure | References @testing-anti-patterns.md for advanced mock guidance, which is good progressive disclosure. However, the main SKILL.md is monolithic with significant content that could be split out (the lengthy rationalizations section, the detailed 'Why Order Matters' arguments). For a skill this long (~250 lines), more content should be offloaded to supporting files. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents