Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.
81
77%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./packages/core/src/methodology/packs/debugging/hypothesis-testing/SKILL.mdQuality
Discovery
92%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong skill description that clearly articulates both what the skill does and when to use it, with natural trigger terms that match how users actually describe debugging scenarios. The scientific method framing provides a distinctive angle, though the broad debugging triggers could overlap with other coding/debugging skills. The description is well-structured, uses third person voice correctly, and lists concrete actions.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.' Also describes the methodology clearly. | 3 / 3 |
Completeness | Clearly answers both 'what' (applies scientific method to debugging with specific actions like isolating components, forming hypotheses, analyzing errors) and 'when' (explicit 'Use when...' clause with multiple natural trigger scenarios). | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural user language: 'code isn't working', 'getting an error', 'something broke', 'troubleshoot a bug', 'figure out what's causing an issue'. These are phrases users would naturally say when encountering bugs. | 3 / 3 |
Distinctiveness Conflict Risk | While the scientific method / hypothesis-driven approach is a distinctive angle, the trigger terms like 'code isn't working', 'getting an error', and 'troubleshoot a bug' are very broad and could easily overlap with general coding assistance or error-fixing skills. The debugging domain is common enough that this could conflict with other debugging-related skills. | 2 / 3 |
Total | 11 / 12 Passed |
Implementation
62%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a well-structured methodological skill that clearly teaches the scientific debugging approach with a logical 5-step workflow and good examples. Its main weaknesses are moderate verbosity (the extended connection pool scenario, while illustrative, is lengthy) and the fact that much of the content is process/framework guidance rather than directly executable actions. The workflow clarity is strong with explicit feedback loops and a decision tree.
Suggestions
Tighten the connection pool example — it's repeated across Observe, Predict, Test, and Analyze sections; consider using one compact running example or reducing redundancy.
Make the testing technique code snippets more actionable by showing how to integrate them into an actual debugging session rather than presenting them as standalone patterns.
Consider extracting the hypothesis tracking template and testing techniques into a separate reference file to improve progressive disclosure and reduce the main skill's length.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably well-structured but includes some unnecessary verbosity. The extended examples (connection pool scenario) are illustrative but lengthy. The hypothesis tracking template is a large block that could be more compact. Some sections like the decision tree add value but the overall content could be tightened. | 2 / 3 |
Actionability | The skill provides concrete examples and code snippets for testing techniques, but much of the content is templated/illustrative rather than directly executable. The code examples are useful but the core methodology is more of a framework/process description than copy-paste-ready debugging commands. The hypothesis tracking template is actionable but the overall guidance leans toward structured thinking rather than specific executable steps. | 2 / 3 |
Workflow Clarity | The 5-step scientific method (Observe → Hypothesize → Predict → Test → Analyze) is clearly sequenced with explicit validation checkpoints. The predict step defines expected results for both true and false cases, creating a natural feedback loop. The decision tree provides clear branching logic for what to do when results are inconclusive or hypotheses are rejected. The analyze step explicitly asks whether to form new hypotheses. | 3 / 3 |
Progressive Disclosure | The skill references other skills (root-cause-analysis, trace-and-isolate, red-green-refactor) at the end, which is good navigation. However, the content is monolithic — the hypothesis tracking template, testing techniques by type, and the extended connection pool example could potentially be split into referenced files. With no bundle files, all content is inline, making it a long single document that could benefit from better structural separation. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
a9e5c83
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.