Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
90
88%
Does it follow best practices?
Impact
90%
0.97xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates a structured debugging methodology with specific steps, and provides excellent trigger guidance through an explicit 'Use when...' clause with multiple natural language variations. The description is concise yet comprehensive, uses third person voice appropriately, and carves out a distinct niche for hard bug diagnosis and performance regression investigation.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions in a clear methodology: 'Reproduce → minimise → hypothesise → instrument → fix → regression-test'. This describes a concrete, structured debugging workflow rather than vague language. | 3 / 3 |
Completeness | Clearly answers both 'what' (disciplined diagnosis loop with specific steps) and 'when' (explicit 'Use when...' clause listing multiple trigger scenarios including bug reports, broken/throwing/failing states, and performance regressions). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms users would actually say: 'diagnose this', 'debug this', 'bug', 'broken', 'throwing', 'failing', 'performance regression'. These are highly natural phrases users would use when encountering issues. | 3 / 3 |
Distinctiveness Conflict Risk | Clearly carved out niche focused specifically on hard bugs and performance regressions with a disciplined diagnostic methodology. The specific trigger terms ('diagnose', 'debug', 'broken', 'throwing', 'failing', 'performance regression') are distinct and unlikely to conflict with general coding or testing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured diagnostic methodology skill that excels at workflow clarity and conciseness. It provides a disciplined six-phase process with explicit gates, checklists, and decision points. The main weakness is that while it names many concrete strategies, it rarely provides executable code or copy-paste-ready commands, and the referenced template script (hitl-loop.template.sh) doesn't exist in the bundle.
Suggestions
Add at least one executable code example for the most common feedback loop types (e.g., a minimal pytest regression test template, a curl script template) to improve actionability.
Include the referenced 'scripts/hitl-loop.template.sh' as a bundle file, or remove the reference if it's not available.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Every section earns its place. No explanation of what debugging is or how tools work — it assumes Claude's competence and focuses entirely on the discipline and decision framework Claude wouldn't already have. The numbered lists are dense with actionable options, not padding. | 3 / 3 |
Actionability | The skill provides a highly structured methodology with concrete strategies (e.g., 10 ways to build a feedback loop, tagged debug logs with grep cleanup), but lacks executable code examples or copy-paste-ready commands. Guidance like 'Curl / HTTP script' or 'Headless browser script' names the approach without showing concrete implementation. The checklists and format templates (hypothesis format) partially compensate. | 2 / 3 |
Workflow Clarity | Six clearly sequenced phases with explicit gate conditions ('Do not proceed until you reproduce the bug'), validation checklists (Phase 2 and Phase 6), feedback loops (fix → watch fail → fix → watch pass → re-run original loop), and error recovery guidance (Phase 1's 'When you genuinely cannot build a loop'). The workflow is exemplary for a complex, multi-step diagnostic process. | 3 / 3 |
Progressive Disclosure | The content is well-organized with clear phase headers and sub-sections, but it's a single monolithic file with no references to supporting materials. The reference to '/improve-codebase-architecture' skill and 'scripts/hitl-loop.template.sh' are good signals, but there are no bundle files to support them. For a skill of this length (~150 lines), some content (e.g., the 10 feedback loop strategies) could be split into a reference file. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
a05d4e5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.