CtrlK
BlogDocsLog inGet started
Tessl Logo

jbvc/systematic-debugging

Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes

57

Quality

57%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

14%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This description only specifies when to use the skill but entirely omits what the skill actually does, making it incomplete and unhelpful for skill selection. The trigger terms cover some common scenarios but miss many natural variations. The lack of concrete actions and broad scope make it nearly indistinguishable from any other debugging-related skill.

Suggestions

Add concrete actions describing what the skill does, e.g., 'Systematically diagnoses root causes by analyzing stack traces, reproducing issues, and tracing code paths' or similar specific capabilities.

Expand trigger terms to include common variations like 'error', 'exception', 'crash', 'debug', 'not working', 'broken', 'failing tests', 'stack trace'.

Clarify what distinguishes this skill from other debugging approaches—e.g., is it a specific methodology like bisection, log analysis, or systematic hypothesis testing?

DimensionReasoningScore

Specificity

The description does not list any concrete actions or capabilities. It says nothing about what the skill actually does—no verbs like 'analyze', 'diagnose', 'trace', or 'inspect'. It only describes when to use it, not what it does.

1 / 3

Completeness

The description answers 'when' (before proposing fixes for bugs/failures) but completely omits 'what' the skill does. There is no indication of the skill's capabilities or actions, making the 'what' very weak.

1 / 3

Trigger Term Quality

It includes some natural trigger terms like 'bug', 'test failure', and 'unexpected behavior' that users might mention. However, it misses common variations like 'error', 'crash', 'exception', 'broken', 'not working', 'debug', or 'failing tests'.

2 / 3

Distinctiveness Conflict Risk

The description is extremely broad—any debugging or troubleshooting skill could match this. Without specifying what methodology, tools, or approach it uses, it would conflict with many other debugging-related skills.

1 / 3

Total

5

/

12

Passed

Implementation

54%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill has excellent workflow structure with clear phases, validation gates, and escalation paths, and good progressive disclosure to supporting files. However, it is significantly over-verbose, spending many tokens on motivational content, redundant warnings, and rationalizations tables that don't add actionable value for Claude. The actionability is moderate—while the process is clear, most steps are abstract instructions rather than concrete executable techniques.

Suggestions

Cut the 'Common Rationalizations' table, 'Real-World Impact' statistics, and 'your human partner's Signals' section entirely—these are motivational content that Claude doesn't need and consume significant tokens.

Consolidate the 'When to Use', 'Red Flags', and 'Don't skip when' sections into a single brief trigger list, removing redundant admonitions about not skipping the process.

Add more concrete, executable examples for each phase—e.g., specific git commands for Phase 1 step 3, a template for writing down hypotheses in Phase 3, or a concrete test-writing example for Phase 4.

DimensionReasoningScore

Conciseness

Extremely verbose for a debugging methodology skill. Contains extensive motivational/persuasion content ('Random fixes waste time', 'rushing guarantees rework', common rationalizations table, 'Real-World Impact' statistics) that Claude doesn't need. The 'When to Use', 'Red Flags', 'Common Rationalizations', and 'your human partner's Signals' sections are largely redundant with each other. Much of this is explaining concepts Claude already understands about debugging methodology.

1 / 3

Actionability

Provides a structured process with some concrete examples (the bash diagnostic instrumentation example is good), but most guidance is procedural/philosophical rather than executable. Steps like 'Read Error Messages Carefully' and 'Form Single Hypothesis' are abstract instructions rather than concrete, copy-paste-ready techniques. The multi-component diagnostic example is a bright spot but is domain-specific.

2 / 3

Workflow Clarity

The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Includes validation checkpoints (Phase 3 verify step, Phase 4 verify fix), feedback loops (return to Phase 1 if hypothesis fails), and an escalation path (3+ failures → question architecture). The quick reference table provides a clear summary.

3 / 3

Progressive Disclosure

Appropriately references supporting techniques in separate files (root-cause-tracing.md, defense-in-depth.md, condition-based-waiting.md) with one-level-deep references. Cross-references related skills (test-driven-development, verification-before-completion). The main file serves as an overview with clear navigation to detailed materials.

3 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Reviewed

Table of Contents