TDD iteration loops using Claude Code Stop hooks - runs tests after each response, feeds failures back automatically
58
67%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/iterative-development/SKILL.mdQuality
Discovery
57%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear and distinctive niche (TDD loops with Claude Code Stop hooks) and conveys the core mechanism well. However, it lacks an explicit 'Use when...' clause and could benefit from broader trigger term coverage to help Claude select it in more natural user request scenarios.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to set up test-driven development loops, automated test feedback, or continuous red-green-refactor cycles with Claude Code.'
Include additional natural trigger terms such as 'test-driven development', 'red-green-refactor', 'unit tests', 'test automation', and 'continuous test feedback' to improve discoverability.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (TDD iteration loops) and mentions specific mechanisms (Stop hooks, running tests, feeding failures back), but doesn't list multiple concrete actions beyond the single workflow pattern. | 2 / 3 |
Completeness | Describes what it does (TDD iteration loops using Stop hooks, runs tests, feeds failures back) but lacks an explicit 'Use when...' clause or equivalent trigger guidance, which caps this at 2 per the rubric. | 2 / 3 |
Trigger Term Quality | Includes relevant terms like 'TDD', 'tests', 'failures', and 'Stop hooks', but misses common user variations like 'test-driven development', 'red-green-refactor', 'unit tests', 'test loop', or 'automated testing'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'TDD iteration loops', 'Claude Code Stop hooks', and automatic test feedback creates a very specific niche that is unlikely to conflict with other skills. | 3 / 3 |
Total | 9 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill with complete executable scripts and clear workflow mechanics. Its main weakness is length — the Python variant, philosophy section, and supplementary hooks add bulk that could be offloaded to separate files. The error classification table and use-case guidance are valuable additions that demonstrate thoughtful design.
Suggestions
Remove or significantly condense the 'Core Philosophy' ASCII box — these are general development principles Claude already understands.
Consider moving the Python variant and additional hooks (PreToolUse, SessionStart) to separate referenced files to reduce the main skill's token footprint.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill contains some unnecessary verbosity: the ASCII box diagrams for 'Core Philosophy' add little value, the 'How It Actually Works' section partially repeats the concept introduction, and the Python variant is largely duplicative of the bash script. However, the core content (hook config, scripts, error classification) is mostly useful and not padded with things Claude already knows. | 2 / 3 |
Actionability | The skill provides fully executable, copy-paste ready code: complete JSON configuration for hooks, a full bash script with iteration tracking and safety limits, a Python variant, and additional hook configurations. All commands and file paths are specific and concrete. | 3 / 3 |
Workflow Clarity | The workflow is clearly sequenced with the ASCII flow diagram showing the exact loop mechanics. Validation is built into the core concept (tests/lint/typecheck as checkpoints), there's explicit error recovery (exit 2 feeds back, exit 0 stops), a safety valve (MAX_ITERATIONS), and clear error classification distinguishing loopable vs non-loopable failures. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections and headers, but it's quite long for a single file with no bundle files to offload detail into. The Python variant, additional hooks, and use-case tables could be split into separate reference files. For a standalone SKILL.md with no bundle, it's reasonably organized but borders on monolithic. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
7e5f7a2
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.