Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
35
31%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/systematic-debugging/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description only specifies when to use the skill but entirely omits what the skill actually does, making it incomplete and unhelpful for skill selection. The trigger terms cover some common scenarios but miss many natural variations. The lack of concrete actions and broad scope make it nearly indistinguishable from any other debugging-related skill.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Systematically diagnoses root causes by analyzing stack traces, reproducing issues, and tracing code paths' or similar specific capabilities.
Expand trigger terms to include common variations like 'error', 'exception', 'crash', 'debug', 'not working', 'broken', 'failing tests', 'stack trace'.
Clarify what distinguishes this skill from other debugging approaches—e.g., is it a specific methodology like bisection, log analysis, or systematic hypothesis testing?
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description does not list any concrete actions or capabilities. It says nothing about what the skill actually does—no verbs like 'analyze', 'diagnose', 'trace', or 'inspect'. It only describes when to use it, not what it does. | 1 / 3 |
Completeness | The description answers 'when' (before proposing fixes for bugs/failures) but completely omits 'what' the skill does. There is no indication of the skill's capabilities or actions, making the 'what' very weak. | 1 / 3 |
Trigger Term Quality | It includes some natural trigger terms like 'bug', 'test failure', and 'unexpected behavior' that users might mention. However, it misses common variations like 'error', 'crash', 'exception', 'broken', 'not working', 'debug', or 'failing tests'. | 2 / 3 |
Distinctiveness Conflict Risk | The description is extremely broad—any debugging or troubleshooting skill could match this. Without specifying what methodology, tools, or approach it uses, it would conflict with many other debugging-related skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has a well-structured four-phase debugging workflow with clear sequencing and escalation paths, which is its strongest aspect. However, it is severely bloated with motivational content, rationalizations, and behavioral coaching that Claude doesn't need—sections like 'Common Rationalizations,' 'Red Flags,' and 'Real-World Impact' are largely filler. The actionable technical content (the diagnostic instrumentation example, the phase structure) is good but buried under excessive process evangelism.
Suggestions
Cut 'Common Rationalizations,' 'Red Flags,' 'your human partner's Signals,' 'Real-World Impact,' and most of 'When to Use' — these are behavioral coaching that wastes tokens without adding actionable guidance.
Add more concrete, executable examples for each phase (e.g., specific git commands for Phase 1 step 3, a template for hypothesis documentation in Phase 3).
Consolidate the four phases into a compact quick-reference format and move detailed explanations to a separate file, improving progressive disclosure.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~250+ lines. Extensive sections on 'Common Rationalizations,' 'Red Flags,' 'your human partner's Signals,' and 'When to Use' all explain things Claude already knows about debugging discipline. The 'Real-World Impact' statistics are unverifiable filler. Multiple sections repeat the same 'STOP, return to Phase 1' instruction. The content could be cut by 60%+ without losing actionable guidance. | 1 / 3 |
Actionability | The multi-layer diagnostic bash example is concrete and useful, and the four-phase structure provides a clear methodology. However, most guidance is philosophical rather than executable—it tells Claude what to think ('form single hypothesis,' 'be specific not vague') rather than providing concrete commands or code patterns. The skill is more of a behavioral manifesto than an actionable technical reference. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Phase 4 includes validation checkpoints (create failing test, verify fix, escalation after 3 failed attempts). The feedback loop for failed hypotheses (return to Phase 1) and the architectural escalation path are well-defined. | 3 / 3 |
Progressive Disclosure | References to supporting files (root-cause-tracing.md, defense-in-depth.md, condition-based-waiting.md) and related skills are clearly signaled at the end. However, the main SKILL.md itself is monolithic—the rationalizations table, red flags list, and extensive 'When to Use' section could be separated or removed. No bundle files were provided to verify referenced paths exist. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
c037ce3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.