CtrlK
BlogDocsLog inGet started
Tessl Logo

systematic-debugging

Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes

35

Quality

31%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/systematic-debugging/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

14%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description only specifies when to use the skill but entirely omits what the skill actually does, making it incomplete and unhelpful for skill selection. The trigger terms cover some common scenarios but miss many natural variations. The lack of concrete actions and broad scope make it nearly indistinguishable from any other debugging-related skill.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Systematically diagnoses root causes by analyzing stack traces, reproducing issues, and tracing code paths' or similar specific capabilities.

Expand trigger terms to include common variations like 'error', 'exception', 'crash', 'debug', 'not working', 'broken', 'failing tests', 'stack trace'.

Clarify what distinguishes this skill from other debugging approaches—e.g., is it a specific methodology like bisection, log analysis, or systematic hypothesis testing?

DimensionReasoningScore

Specificity

The description does not list any concrete actions or capabilities. It says nothing about what the skill actually does—no verbs like 'analyze', 'diagnose', 'trace', or 'inspect'. It only describes when to use it, not what it does.

1 / 3

Completeness

The description answers 'when' (before proposing fixes for bugs/failures) but completely omits 'what' the skill does. There is no indication of the skill's capabilities or actions, making the 'what' very weak.

1 / 3

Trigger Term Quality

It includes some natural trigger terms like 'bug', 'test failure', and 'unexpected behavior' that users might mention. However, it misses common variations like 'error', 'crash', 'exception', 'broken', 'not working', 'debug', or 'failing tests'.

2 / 3

Distinctiveness Conflict Risk

The description is extremely broad—any debugging or troubleshooting skill could match this. Without specifying what methodology, tools, or approach it uses, it would conflict with many other debugging-related skills.

1 / 3

Total

5

/

12

Passed

Implementation

47%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill has a well-structured four-phase debugging workflow with clear sequencing and escalation paths, which is its strongest aspect. However, it is severely bloated with motivational content, rationalizations, and behavioral coaching that Claude doesn't need—sections like 'Common Rationalizations,' 'Red Flags,' and 'Real-World Impact' are largely filler. The actionable technical content (the diagnostic instrumentation example, the phase structure) is good but buried under excessive process evangelism.

Suggestions

Cut 'Common Rationalizations,' 'Red Flags,' 'your human partner's Signals,' 'Real-World Impact,' and most of 'When to Use' — these are behavioral coaching that wastes tokens without adding actionable guidance.

Add more concrete, executable examples for each phase (e.g., specific git commands for Phase 1 step 3, a template for hypothesis documentation in Phase 3).

Consolidate the four phases into a compact quick-reference format and move detailed explanations to a separate file, improving progressive disclosure.

DimensionReasoningScore

Conciseness

Extremely verbose at ~250+ lines. Extensive sections on 'Common Rationalizations,' 'Red Flags,' 'your human partner's Signals,' and 'When to Use' all explain things Claude already knows about debugging discipline. The 'Real-World Impact' statistics are unverifiable filler. Multiple sections repeat the same 'STOP, return to Phase 1' instruction. The content could be cut by 60%+ without losing actionable guidance.

1 / 3

Actionability

The multi-layer diagnostic bash example is concrete and useful, and the four-phase structure provides a clear methodology. However, most guidance is philosophical rather than executable—it tells Claude what to think ('form single hypothesis,' 'be specific not vague') rather than providing concrete commands or code patterns. The skill is more of a behavioral manifesto than an actionable technical reference.

2 / 3

Workflow Clarity

The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Phase 4 includes validation checkpoints (create failing test, verify fix, escalation after 3 failed attempts). The feedback loop for failed hypotheses (return to Phase 1) and the architectural escalation path are well-defined.

3 / 3

Progressive Disclosure

References to supporting files (root-cause-tracing.md, defense-in-depth.md, condition-based-waiting.md) and related skills are clearly signaled at the end. However, the main SKILL.md itself is monolithic—the rationalizations table, red flags list, and extensive 'When to Use' section could be separated or removed. No bundle files were provided to verify referenced paths exist.

2 / 3

Total

8

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
lucianghinda/superpowers-ruby
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.