Four-phase debugging framework that ensures root cause investigation before attempting fixes. Never jump to solutions. Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes.
77
64%
Does it follow best practices?
Impact
100%
1.03xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/systematic-debugging/skills/systematic-debugging/SKILL.mdQuality
Discovery
82%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a solid description that clearly communicates both what the skill does and when to use it, with good natural trigger terms. Its main weaknesses are the lack of specificity about what the four phases actually entail and the broad scope that could conflict with other debugging-related skills. Adding the phase names and narrowing the scope slightly would strengthen it.
Suggestions
List the four phases explicitly (e.g., 'reproduce, diagnose, identify root cause, verify fix') to increase specificity and help Claude understand the concrete actions involved.
Differentiate from other potential debugging skills by specifying what makes this framework unique—e.g., is it for code bugs specifically, production issues, or all software debugging?
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (debugging) and describes the approach (four-phase framework, root cause investigation before fixes), but doesn't list the specific concrete actions or phases involved. 'Never jump to solutions' is a behavioral constraint rather than a capability. | 2 / 3 |
Completeness | Clearly answers both what ('Four-phase debugging framework that ensures root cause investigation before attempting fixes') and when ('Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes') with explicit trigger guidance. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms that users would actually say: 'bug', 'test failure', 'unexpected behavior', 'debugging', 'fixes'. These cover common variations of how users describe debugging scenarios. | 3 / 3 |
Distinctiveness Conflict Risk | While the debugging focus is clear, 'bug, test failure, or unexpected behavior' is quite broad and could overlap with other debugging, testing, or troubleshooting skills. The 'four-phase' and 'root cause before fixes' framing helps somewhat but the trigger scope is very wide. | 2 / 3 |
Total | 10 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill defines a clear, well-structured four-phase debugging workflow with good escalation paths and feedback loops. However, it is significantly over-verbose, spending many tokens on motivational content, rationalizations, and abstract advice that Claude doesn't need. The actionability is moderate — the multi-component diagnostic example is strong, but most guidance is procedural description rather than executable patterns.
Suggestions
Cut the 'Common Rationalizations' table, 'Real-World Impact' statistics, and most of the 'When to Use'/'Don't skip when' sections — these are motivational filler that Claude doesn't need and consume significant tokens.
Replace abstract instructions like 'Read error messages carefully' and 'Form single hypothesis' with concrete executable examples or templates (e.g., a hypothesis template, a specific git command sequence for checking recent changes).
Move the 'Red Flags' list and 'Quick Reference' table to a separate bundle file if they must be retained, keeping SKILL.md focused on the core four-phase process.
Either provide the referenced companion skill files (root-cause-tracing, defense-in-depth-validation, verification-before-completion) as bundle files or inline their key content to avoid dangling references.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose for what it teaches. The 'When to Use', 'Don't skip when', 'Red Flags', 'Common Rationalizations', and 'Real-World Impact' sections are heavily padded with motivational/persuasive content Claude doesn't need. The same core process could be conveyed in ~40% of the tokens. Phrases like 'Violating the letter of this process is violating the spirit of debugging' and the extensive rationalizations table explain concepts Claude already understands. | 1 / 3 |
Actionability | The four phases provide a concrete sequence of steps, and the multi-component diagnostic example with bash commands is executable and useful. However, much of the guidance remains abstract ('Read error messages carefully', 'Form single hypothesis', 'Be specific, not vague') rather than providing concrete executable patterns. The Phase 4 code examples are generic test runner invocations rather than meaningful debugging code. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced with explicit gates ('MUST complete each phase before proceeding'), validation checkpoints (Phase 3 verify step, Phase 4 test verification), and a well-defined feedback loop (return to Phase 1 if fix fails, escalate to architectural review after 3+ failures). The escalation path and error recovery are explicit. | 3 / 3 |
Progressive Disclosure | References to companion skills (root-cause-tracing, defense-in-depth-validation, verification-before-completion) are mentioned but no bundle files exist to support them. The skill itself is monolithic — the rationalizations table, red flags list, and quick reference could be separate files or omitted. The content that is inline (200+ lines) would benefit from being split. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
5e92b71
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.