Post-execution test review and fix - chain from workflow-lite-execute or standalone. Reviews implementation against plan, runs tests, auto-fixes failures.
70
63%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Risky
Do not use without reviewing
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/workflow-lite-test-review/SKILL.mdQuality
Discovery
50%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description conveys a reasonable sense of what the skill does—reviewing implementations, running tests, and fixing failures—but lacks an explicit 'Use when...' clause and natural trigger terms users would employ. The reference to 'workflow-lite-execute' is internal jargon that doesn't help with user-facing skill selection, and the actions could be more concrete.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when tests fail after code implementation, when the user asks to run and fix tests, or after a workflow-lite-execute step completes.'
Include more natural trigger terms users would say, such as 'failing tests', 'test errors', 'fix broken tests', 'debug test failures', 'run test suite'.
Be more specific about the concrete actions: what types of tests (unit, integration), what 'reviewing against plan' means, and what kinds of auto-fixes are applied.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names some actions ('reviews implementation against plan', 'runs tests', 'auto-fixes failures') but they are somewhat generic and not deeply specific about what kinds of tests, what kinds of fixes, or what 'reviewing against plan' entails concretely. | 2 / 3 |
Completeness | The 'what' is partially addressed (reviews, runs tests, auto-fixes), but there is no explicit 'Use when...' clause. The phrase 'chain from workflow-lite-execute or standalone' hints at when but doesn't clearly articulate trigger conditions for Claude to select this skill. | 2 / 3 |
Trigger Term Quality | Includes some relevant terms like 'test', 'fix', 'failures', and 'post-execution', but misses common natural user phrases like 'run tests', 'debug', 'test failures', 'failing tests', 'fix broken tests'. The term 'workflow-lite-execute' is internal jargon unlikely to be used by users. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'post-execution test review' and 'chain from workflow-lite-execute' provides some distinctiveness, but terms like 'runs tests' and 'auto-fixes failures' could overlap with general testing or debugging skills. The niche is somewhat defined but not sharply delineated. | 2 / 3 |
Total | 8 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-crafted workflow skill with excellent actionability and workflow clarity — phases are clearly sequenced with validation checkpoints, feedback loops for test fixing, and concrete executable code throughout. The main weakness is that all content lives in a single file without progressive disclosure; the detailed data structures and phase implementations could benefit from being split into reference files. Conciseness is adequate but some explanatory sections could be tightened.
Suggestions
Consider extracting the Data Structures section and detailed phase implementations into separate reference files (e.g., DATA-STRUCTURES.md, PHASES.md) and linking from the main skill with a brief summary table.
Tighten the Input Modes section — the prose explanation and the code block are somewhat redundant; the code block alone with brief inline comments would suffice.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~250 lines) but most content is structural (phase definitions, data schemas, code blocks). Some sections like the Mode 1/Mode 2 explanation could be tighter, and the data structures section is verbose but serves as a reference. Overall mostly efficient with some room for tightening. | 2 / 3 |
Actionability | Provides fully executable JavaScript code for session resolution, agent delegation, CLI invocation, and test framework detection. Commands are concrete (e.g., `npm test`, `python -m pytest -v --tb=short`), data structures have complete schemas, and the auto-fix loop has specific iteration logic with clear agent prompts. | 3 / 3 |
Workflow Clarity | Five phases are clearly sequenced with explicit TodoWrite checkpoints between each phase. Includes validation/feedback loops (Phase 4 iterative fix with max 3 rounds, re-run and break on pass), skip conditions are clearly stated, and Phase 5 has a mandatory checkpoint callout. Error handling table covers failure modes with resolutions. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear sections and a phase summary table for navigation, but it's monolithic — all detail is inline in a single file with no references to supporting documents. The data structures and detailed phase logic could be split into separate reference files. The one external reference (`ccw spec load --category test`) is appropriate but the rest is a wall of detailed content. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
Total | 10 / 11 Passed | |
227244f
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.