Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes
47
34%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.opencode/skills/systematic-debugging/SKILL.mdQuality
Discovery
14%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This description only specifies when to use the skill but entirely omits what the skill actually does, making it incomplete and vague. The trigger terms cover some common scenarios but miss many natural variations. Without concrete actions or a clear niche, this skill would be difficult to distinguish from other debugging or troubleshooting skills.
Suggestions
Add concrete actions describing what the skill does, e.g., 'Systematically diagnoses root causes by analyzing stack traces, reproducing issues, and isolating failing components'.
Expand trigger terms to include common variations like 'error', 'exception', 'crash', 'debug', 'not working', 'broken', 'failing tests'.
Clarify the skill's distinct niche—what makes this debugging approach different from general troubleshooting? E.g., 'Applies a structured root-cause analysis methodology before proposing fixes'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | The description does not list any concrete actions or capabilities. It says nothing about what the skill actually does—no verbs like 'analyze', 'diagnose', 'trace', or 'inspect'. It only describes when to use it, not what it does. | 1 / 3 |
Completeness | The description answers 'when' (encountering bugs, test failures, unexpected behavior) but completely omits 'what' the skill does. There is no indication of the actions or methodology the skill provides. | 1 / 3 |
Trigger Term Quality | It includes some natural trigger terms like 'bug', 'test failure', and 'unexpected behavior' that users might mention. However, it misses common variations like 'error', 'crash', 'exception', 'broken', 'not working', 'debug', or 'failing tests'. | 2 / 3 |
Distinctiveness Conflict Risk | The description is extremely broad—any debugging, troubleshooting, or error-handling skill could match this. Without specifying what approach or actions it takes, it would conflict with many other skills that deal with bugs or errors. | 1 / 3 |
Total | 5 / 12 Passed |
Implementation
54%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill has excellent workflow structure with clear phases, gates, and escalation paths, and good progressive disclosure to supporting materials. However, it is significantly over-verbose, spending many tokens on motivational content, rationalizations tables, and advice Claude already knows (like 'read error messages' and 'don't skip past errors'). The actionability is moderate—the multi-component diagnostic example is strong but most guidance remains abstract process description.
Suggestions
Cut motivational/persuasion content (Common Rationalizations table, Real-World Impact statistics, 'your human partner's Signals' section) — Claude doesn't need to be convinced to follow its own skill instructions.
Remove obvious advice like 'Read Error Messages Carefully' and 'Don't skip past errors' — Claude already knows this. Focus tokens on the non-obvious techniques like the multi-component diagnostic instrumentation pattern.
Add more concrete, executable examples similar to the bash diagnostic example — e.g., specific git commands for 'Check Recent Changes', specific patterns for creating minimal reproduction test cases.
Consolidate the Red Flags list and Common Rationalizations table into a single brief 'Anti-patterns' section with 3-4 key items instead of 20+ overlapping bullet points.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | Extremely verbose for a debugging methodology skill. Much of this content restates things Claude already knows (scientific method, 'read error messages carefully', 'don't skip past errors'). The 'Common Rationalizations' table, 'Red Flags' list, 'your human partner's Signals' section, and 'Real-World Impact' statistics are motivational padding rather than actionable guidance. The skill could be cut by 60%+ without losing useful information. | 1 / 3 |
Actionability | The multi-component diagnostic instrumentation example with bash commands is concrete and useful. However, most of the skill is abstract process description ('read error messages carefully', 'form single hypothesis') rather than executable guidance. The phases describe what to do conceptually but lack specific tools, commands, or code patterns for most steps. | 2 / 3 |
Workflow Clarity | The four-phase workflow is clearly sequenced with explicit gates between phases ('MUST complete each phase before proceeding'). Phase 4 includes validation checkpoints (create failing test, verify fix, stop-and-reassess after 3 failures). The escalation path from failed fixes to architectural questioning is a well-defined feedback loop. | 3 / 3 |
Progressive Disclosure | References to supporting techniques (root-cause-tracing.md, defense-in-depth.md, condition-based-waiting.md) and related skills (test-driven-development, verification-before-completion) are clearly signaled and one level deep. The main content serves as an overview with appropriate pointers to detailed materials. | 3 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f062bf8
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.