Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.
87
84%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Quality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates what the skill does (hypothesis-driven debugging) and when to use it (with natural user trigger phrases). The concrete actions are well-enumerated and the 'Use when' clause is explicit and comprehensive. The main weakness is that the trigger terms are very general debugging terms that could conflict with other coding/debugging skills, though the scientific method framing provides some differentiation.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.' Also describes the methodology (scientific method to debugging). | 3 / 3 |
Completeness | Clearly answers both 'what' (applies scientific method to debugging, isolating components, forming hypotheses, analyzing errors, etc.) and 'when' with an explicit 'Use when...' clause listing multiple trigger scenarios. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural user language: 'code isn't working', 'getting an error', 'something broke', 'troubleshoot a bug', 'figure out what's causing an issue'. These are phrases users would naturally say when encountering bugs. | 3 / 3 |
Distinctiveness Conflict Risk | While the scientific method / hypothesis-driven approach is a distinctive angle, the trigger terms ('code isn't working', 'getting an error', 'troubleshoot a bug') are very broad and could easily overlap with any general coding assistance or debugging skill. The niche is debugging methodology rather than a specific domain, which increases conflict risk. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured debugging methodology skill with clear sequential workflow and concrete code examples. Its main weakness is length — the extended connection pool example, while illustrative, makes the document verbose for what could be a more concise methodological guide. The content is highly actionable and the workflow is clear with good feedback loops, but the inline bulk could benefit from progressive disclosure into separate files.
Suggestions
Move the hypothesis tracking template and testing techniques sections into separate referenced files (e.g., TRACKING_TEMPLATE.md, TESTING_TECHNIQUES.md) to reduce the main skill's token footprint.
Trim the connection pool running example — one compact example showing the full observe→hypothesize→predict→test→analyze cycle would suffice instead of the extended multi-step version.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes some unnecessary verbosity. The extended examples (connection pool scenario) are illustrative but lengthy. The hypothesis tracking template is a full markdown template that adds bulk. Some sections like the decision tree in ASCII art could be more compact. | 2 / 3 |
Actionability | The skill provides concrete, executable code examples for timing, data, and state hypothesis testing. The test plan examples are specific and copy-paste ready. The observation format, hypothesis tracking template, and decision tree all give Claude clear, actionable patterns to follow. | 3 / 3 |
Workflow Clarity | The 5-step scientific debugging method (Observe → Hypothesize → Predict → Test → Analyze) is clearly sequenced with explicit validation at each stage. The decision tree provides a feedback loop for inconclusive results and rejected hypotheses. The predict step serves as a built-in verification checkpoint before and after testing. | 3 / 3 |
Progressive Disclosure | The skill references other skills at the bottom (root-cause-analysis, trace-and-isolate, red-green-refactor) which is good, but the main content is a monolithic document. The hypothesis tracking template and testing techniques by type could be split into separate referenced files to keep the main skill leaner. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
dfa8d12
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.