hypothesis-testing

Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.

Quality

77%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./packages/core/src/methodology/packs/debugging/hypothesis-testing/SKILL.md

Quality

Content

62%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured methodological skill that clearly teaches the scientific debugging approach with a logical 5-step workflow and good feedback loops. Its main weaknesses are moderate verbosity (the extended connection pool example runs through all 5 steps in detail) and the fact that it's more of a thinking framework than a set of concrete, executable actions — the code snippets are illustrative rather than directly actionable debugging tools. The content would benefit from tightening and potentially splitting the tracking template and testing techniques into separate reference files.

Suggestions

Trim the extended connection pool example — it effectively walks through the entire method twice (once in the steps, once implicitly). A single concise example threaded through the steps would suffice.

Move the Hypothesis Tracking Template and Testing Techniques by Hypothesis Type sections into separate bundle files (e.g., TRACKING_TEMPLATE.md, TESTING_TECHNIQUES.md) and reference them from the main skill to improve progressive disclosure.

Make the testing technique code snippets more actionable by showing how Claude should actually instrument a user's code (e.g., 'Insert this at line X' or 'Wrap the suspected function') rather than standalone illustrative snippets.

Dimension	Reasoning	Score
Conciseness	The skill is reasonably well-structured but includes some unnecessary verbosity. The extended examples (connection pool scenario) are illustrative but lengthy. The hypothesis tracking template is a full markdown template that adds bulk. Some sections like the decision tree in ASCII art could be more compact.	2 / 3
Actionability	The skill provides concrete examples and code snippets for testing techniques, but much of the content is methodological guidance rather than executable instructions. The code examples are illustrative snippets rather than copy-paste-ready debugging tools. The core workflow is more of a thinking framework than concrete tool usage.	2 / 3
Workflow Clarity	The 5-step scientific debugging method (Observe → Hypothesize → Predict → Test → Analyze) is clearly sequenced with explicit validation at each stage. The decision tree provides clear feedback loops for inconclusive results, rejected hypotheses, and confirmed-but-not-root-cause scenarios. The predict step serves as a built-in validation checkpoint.	3 / 3
Progressive Disclosure	The skill references other skills (root-cause-analysis, trace-and-isolate, red-green-refactor) at the end, which is good. However, the content is somewhat monolithic — the hypothesis tracking template, testing techniques by type, and the extended connection pool example could potentially be split out. No bundle files exist to support progressive disclosure, and the inline content is lengthy.	2 / 3
	Total	9 / 12 Passed

Description

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that excels in specificity, trigger term quality, and completeness. It clearly articulates both the methodology (scientific/hypothesis-driven debugging) and concrete actions, and includes an excellent 'Use when' clause with natural user language. The main weakness is that its trigger terms are quite broad and could conflict with general coding or debugging skills, though the distinctive scientific method framing helps somewhat.

Dimension	Reasoning	Score
Specificity	The description lists multiple specific concrete actions: 'isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.' It also describes the methodology clearly (scientific method applied to debugging).	3 / 3
Completeness	Clearly answers both 'what' (applies scientific method to debugging with specific actions listed) and 'when' (explicit 'Use when...' clause with multiple natural trigger scenarios). Both are well-developed and explicit.	3 / 3
Trigger Term Quality	Excellent coverage of natural user language: 'code isn't working', 'getting an error', 'something broke', 'troubleshoot a bug', 'figure out what's causing an issue'. These are phrases users would naturally say when encountering bugs.	3 / 3
Distinctiveness Conflict Risk	While the scientific method / hypothesis-driven approach is a distinctive angle, the trigger terms ('code isn't working', 'getting an error', 'bug') are very broad and could easily overlap with general coding assistance skills, code review skills, or other debugging-related skills.	2 / 3
	Total	11 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: rohitg00/skillkit
Commit: d2e5c34

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.