Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
44
55%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description only specifies when to use the skill but entirely omits what the skill actually does, making it incomplete and unhelpful for skill selection. The trigger terms cover some common scenarios but miss many natural variations. The lack of concrete actions and broad scope make it nearly indistinguishable from any other debugging-related skill.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Systematically diagnoses root causes by analyzing stack traces, reproducing issues, and tracing code paths' or similar specific capabilities.
Expand trigger terms to include common variations like 'error', 'exception', 'crash', 'debug', 'not working', 'broken', 'failing tests', 'stack trace'.
Specify what distinguishes this skill from other debugging approaches—e.g., does it follow a specific methodology, use particular tools, or focus on a certain type of analysis?
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description does not list any concrete actions or capabilities. It says nothing about what the skill actually does—no verbs like 'analyze', 'diagnose', 'trace', or 'inspect'. It only describes when to use it, not what it does. | 1 / 3 |
Completeness | The description answers 'when' (before proposing fixes for bugs/failures) but completely omits 'what' the skill does. There is no indication of the skill's capabilities or actions, making the 'what' very weak. | 1 / 3 |
Trigger Term Quality | It includes some natural trigger terms like 'bug', 'test failure', and 'unexpected behavior' that users might mention. However, it misses common variations like 'error', 'crash', 'exception', 'broken', 'not working', 'debug', or 'failing tests'. | 2 / 3 |
Distinctiveness Conflict Risk | The description is extremely broad—any debugging or troubleshooting skill could match this. Without specifying what methodology, tools, or approach it uses, it would conflict with many other debugging-related skills. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
47%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent workflow structure with clear phases, gates, and escalation paths, making it strong on workflow clarity. However, it is significantly over-verbose, repeating its core message (investigate before fixing) in numerous forms including tables, bullet lists, red flags, rationalizations, and motivational statistics. The content would benefit greatly from cutting 40-50% of the text while preserving the four-phase structure and the concrete diagnostic example.
Suggestions
Cut the 'Common Rationalizations' table, 'Red Flags' list, 'your human partner's Signals' section, and 'Real-World Impact' statistics — these restate the same principle Claude already understands and add ~80 lines of redundancy.
Remove the extensive 'When to Use' and 'Don't skip when' bullet lists — a single sentence like 'Use for any technical issue before proposing fixes' suffices.
Move the multi-component diagnostic example and the Phase 4 architecture-questioning content into separate referenced files to improve progressive disclosure and reduce the main file's length.
Add more concrete, executable examples beyond the bash diagnostic snippet — e.g., a Python debugging trace example or a specific git bisect workflow.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose at ~250+ lines. Extensive sections on rationalizations, red flags, 'your human partner's signals,' and motivational content ('random fixes waste time') are things Claude already knows or doesn't need repeated in multiple forms. The 'When to Use' section alone has 15+ bullet points including obvious advice like 'rushing guarantees rework.' The same core message (don't fix without investigating) is restated in at least 5 different ways throughout. | 1 / 3 |
Actionability | The multi-component diagnostic example with bash commands is concrete and useful, and the four-phase structure provides a clear methodology. However, most guidance is procedural advice rather than executable commands or code. Phrases like 'Read stack traces completely' and 'Form Single Hypothesis' are instructional but not concretely actionable in the way executable code would be. The skill is more of a philosophical framework than a concrete tool. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Phase 4 includes validation checkpoints (test passes, no regressions), a feedback loop (if fix doesn't work, return to Phase 1), and an escalation path at 3+ failures. The quick reference table provides a clear summary with success criteria for each phase. | 3 / 3 |
Progressive Disclosure | References to supporting files (root-cause-tracing.md, defense-in-depth.md, condition-based-waiting.md) and related skills are well-signaled at the end. However, the main SKILL.md itself is monolithic with substantial content that could be split out (e.g., the rationalizations table, red flags section, real-world impact stats). No bundle files were provided to verify referenced paths exist. | 2 / 3 |
Total | 8 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
Reviewed
Table of Contents