Four-phase debugging framework that ensures root cause investigation before attempting fixes. Never jump to solutions. Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes.
77
64%
Does it follow best practices?
Impact
100%
1.03xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/systematic-debugging/skills/systematic-debugging/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly communicates both what the skill does and when to use it, with good natural trigger terms. Its main weaknesses are the lack of specificity about what the four phases actually entail and the broad scope that could potentially conflict with other debugging-related skills.
Suggestions
List the four phases explicitly (e.g., 'reproduce, diagnose, identify root cause, verify fix') to increase specificity and help Claude understand exactly what methodology this skill provides.
Narrow the distinctiveness by clarifying what differentiates this from general debugging assistance—e.g., 'Use instead of jumping directly to code fixes when a systematic investigation is needed.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (debugging) and describes a high-level approach ('four-phase debugging framework', 'root cause investigation before attempting fixes'), but does not list the specific concrete actions or phases involved. | 2 / 3 |
Completeness | Clearly answers both what ('Four-phase debugging framework that ensures root cause investigation before attempting fixes') and when ('Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes') with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes natural trigger terms users would say: 'bug', 'test failure', 'unexpected behavior', and 'fixes'. These are terms a user would naturally use when encountering issues. | 3 / 3 |
Distinctiveness Conflict Risk | While it targets debugging specifically, the broad scope ('any bug, test failure, or unexpected behavior') could overlap with other debugging or testing skills. The 'four-phase' and 'root cause investigation before fixes' framing adds some distinctiveness but the trigger conditions are very broad. | 2 / 3 |
Total | 10 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill provides a well-structured four-phase debugging workflow with clear sequencing and escalation paths, but is significantly too verbose. It spends excessive tokens on motivational content, rationalizations tables, and mindset coaching that Claude doesn't need, while the actual actionable technical guidance (concrete commands, diagnostic techniques) is relatively thin. The skill would benefit greatly from cutting philosophical content and adding more concrete diagnostic techniques.
Suggestions
Cut the 'Common Rationalizations' table, 'Real-World Impact' section, and most of the 'When to Use' / 'Don't skip when' lists — these are motivational content Claude doesn't need, saving ~40% of tokens.
Replace abstract instructions like 'Form Single Hypothesis' and 'Find Working Examples' with concrete techniques or commands (e.g., specific git commands for bisecting, specific logging patterns for tracing).
Move the Red Flags list and Quick Reference table to a separate reference file, keeping the main skill focused on the four-phase process with concrete examples.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose for a debugging methodology skill. The 'When to Use', 'Red Flags', 'Common Rationalizations', and motivational content ('Random fixes waste time') are things Claude already knows. The repeated emphasis on 'STOP' and 'DON'T' is excessive. The 'Real-World Impact' section with made-up statistics adds no actionable value. Much of this could be cut by 50%+ without losing useful guidance. | 1 / 3 |
Actionability | The four-phase framework provides a structured process, and the multi-component diagnostic example with bash commands is concrete and useful. However, much of the content is philosophical rather than executable — it tells Claude what mindset to have rather than giving specific commands or code to run. The Phase 4 test examples are generic (just 'bun test'), and many steps are abstract instructions like 'Form Single Hypothesis' rather than concrete techniques. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Phase 4 includes a clear feedback loop: if fix doesn't work after <3 attempts return to Phase 1, if ≥3 attempts question architecture. The escalation path and validation checkpoints are well-defined. | 3 / 3 |
Progressive Disclosure | References to companion skills (root-cause-tracing, defense-in-depth-validation, verification-before-completion) are present and one-level deep, which is good. However, the main file itself is monolithic — the Common Rationalizations table, Red Flags list, Quick Reference table, and Real-World Impact section could be trimmed or moved. The inline content is too long for what should be an overview. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
88da5ff
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.