Build behavior-safe code changes with TDD and RED/GREEN evidence. Use when he-plan or he-work requires TDD for a concrete behavior target.
56
66%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./Plugins/harness-engineering/fixtures/budget-archive/2026-04-21/deferred-store/skills/team_automation/he-tdd/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description establishes a clear niche around TDD-based code changes with explicit trigger conditions, making it reasonably complete and distinctive. However, it relies on internal/custom terminology ('he-plan', 'he-work') that limits natural trigger term coverage, and the specific capabilities could be more concretely enumerated (e.g., writing failing tests, implementing minimal code, verifying green state).
Suggestions
Expand trigger terms to include natural user language like 'test-driven development', 'unit tests', 'test first', 'write tests before code'.
List more specific concrete actions such as 'writes failing tests, implements minimal code to pass, verifies RED-to-GREEN transitions, and refactors safely'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It names the domain (TDD, code changes) and mentions some actions ('build behavior-safe code changes', 'RED/GREEN evidence'), but doesn't list multiple specific concrete actions like writing tests, running tests, refactoring, or verifying test output. | 2 / 3 |
Completeness | It answers both 'what' (build behavior-safe code changes with TDD and RED/GREEN evidence) and 'when' (when he-plan or he-work requires TDD for a concrete behavior target) with an explicit 'Use when' clause containing trigger conditions. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'TDD', 'RED/GREEN', and 'behavior target', but uses non-standard terms like 'he-plan' and 'he-work' which are internal/custom references unlikely to be naturally said by users. Missing common variations like 'test-driven development', 'unit tests', 'test first'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of TDD, RED/GREEN evidence, and the specific references to 'he-plan' and 'he-work' create a clear niche that is unlikely to conflict with other skills. It targets a very specific workflow pattern. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
57%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill is well-structured as a progressive disclosure entry point with excellent reference organization and clear 'Read when' signals. However, the body itself lacks concrete, executable examples of the TDD workflow it describes—no sample test code, no framework-specific commands, and no explicit validation checkpoints within the RED/GREEN cycle. The content would benefit from at least one concrete code example showing a RED-to-GREEN transition and more explicit inline verification steps.
Suggestions
Add a concrete, executable example showing a minimal RED test followed by the GREEN fix, using a specific test framework (e.g., pytest or jest) so Claude has a copy-paste-ready template.
Add explicit validation checkpoints within the Procedure steps, e.g., 'Confirm test output shows FAIL before proceeding to step 3' and 'If test does not fail, revisit the assertion before writing implementation code.'
Remove or consolidate the redundant Subagent Routing section since it's already referenced in Full Context, and trim the meta-commentary in the opening paragraph about what 'archived' means.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is moderately efficient but includes some unnecessary sections. The 'Progressive Disclosure Entry' preamble explaining what 'archived' means is meta-commentary Claude doesn't need. The Subagent Routing section repeats a reference already listed in Full Context. The Examples section provides natural language prompts rather than actionable examples, and the Philosophy section adds little operational value. | 2 / 3 |
Actionability | The procedure provides a clear sequence but remains at a high level of abstraction—'produce a failing test first (RED), then apply the smallest fix (GREEN)' is directional rather than concrete. There are no executable code examples, no specific test framework commands (beyond the validation audit command), and no sample test code showing what a RED/GREEN cycle looks like in practice. The skill relies heavily on deferred references for actual operational detail. | 2 / 3 |
Workflow Clarity | The 5-step procedure provides a reasonable sequence and the constraint 'Do not skip RED verification' acts as a checkpoint. However, there are no explicit validation gates between steps (e.g., 'confirm RED output before proceeding to GREEN'), no error recovery guidance if a test doesn't fail as expected, and the validation section only covers skill auditing rather than the TDD workflow itself. For a workflow involving iterative test-code cycles, the lack of inline verification checkpoints is a gap. | 2 / 3 |
Progressive Disclosure | The skill excels at progressive disclosure with a concise overview and well-organized, clearly signaled references with 'Read when' annotations explaining when to load each reference. References are one level deep and cover distinct concerns (mocking, interface design, refactoring, etc.). The structure makes it easy to navigate to the right detailed document based on the current need. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
metadata_version | 'metadata.version' is missing | Warning |
metadata_field | 'metadata' should map string keys to string values | Warning |
Total | 9 / 11 Passed | |
d00c351
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.