Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when the user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
68
81%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly communicates both the methodology (a structured diagnosis loop) and when to use it (debugging, bug reports, failures, performance regressions). It uses third person voice, includes natural trigger terms, and is concise without being vague. The explicit step-by-step process and comprehensive trigger phrases make it highly effective for skill selection.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists a concrete, multi-step methodology: 'Reproduce → minimise → hypothesise → instrument → fix → regression-test.' These are specific, actionable steps that clearly describe what the skill does. | 3 / 3 |
Completeness | Clearly answers both 'what' (disciplined diagnosis loop with explicit steps) and 'when' (explicit 'Use when...' clause listing multiple trigger phrases and scenarios). | 3 / 3 |
Trigger Term Quality | Includes a strong set of natural trigger terms users would actually say: 'diagnose this', 'debug this', 'bug', 'broken', 'throwing', 'failing', 'performance regression'. These cover common variations of how users describe problems. | 3 / 3 |
Distinctiveness Conflict Risk | The focus on a structured diagnosis loop for hard bugs and performance regressions is a clear niche. The specific methodology and trigger terms distinguish it from general coding or testing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured diagnostic methodology skill with excellent workflow clarity and phase gating. Its main weaknesses are the lack of executable code examples (relying on descriptive guidance rather than copy-paste commands) and some verbosity in explanatory asides. The referenced bundle files (hitl-loop.template.sh, etc.) are not provided, which weakens the progressive disclosure story.
Suggestions
Add at least 2-3 concrete, executable code/command examples — e.g., a sample bisection harness script, a curl-based feedback loop, or a tagged debug log pattern in a specific language.
Provide the referenced scripts/hitl-loop.template.sh file and consider extracting the 10 loop-construction techniques into a separate reference file to improve progressive disclosure.
Trim explanatory parentheticals in checklists (e.g., 'one user vs. all users, prod vs. dev') that Claude can infer from context.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is well-written and mostly efficient, but some sections could be tightened. The Phase 0 checklist items include explanatory parentheticals that Claude doesn't need (e.g., 'one user vs. all users, prod vs. dev, recoverable vs. data-loss'). The Phase 1 list of 10 loop construction methods is thorough but some entries have unnecessary elaboration. Overall it respects Claude's intelligence but isn't maximally lean. | 2 / 3 |
Actionability | The skill provides a highly structured methodology with concrete checklists and specific techniques, but lacks executable code examples. The feedback loop section lists approaches (curl scripts, Playwright, bisection) without providing copy-paste-ready commands or code snippets. The hypothesis format template is concrete and actionable, but most guidance remains at the 'what to do' level rather than 'here is the exact command'. | 2 / 3 |
Workflow Clarity | Excellent multi-step workflow with six clearly sequenced phases, explicit gate conditions between phases ('Don't proceed until you reproduce', 'Don't proceed to Phase 2 until you have a loop'), validation checkpoints (Phase 2 confirmation checklist, Phase 6 cleanup checklist), and feedback loops (iterate on the loop itself, fix→test→verify cycle). The 'when you genuinely cannot build a loop' escape hatch is a strong error-recovery pattern. | 3 / 3 |
Progressive Disclosure | The content references external files (scripts/hitl-loop.template.sh, CONTEXT.md, docs/adr/, ATTRIBUTION.md, improve-codebase-architecture) but no bundle files are provided to support these references. The skill itself is a single long document (~150 lines) that could benefit from splitting detailed technique lists (e.g., the 10 loop construction methods) into a reference file. The structure within the file is good with clear phase headers. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
be88d6c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.