CtrlK
BlogDocsLog inGet started
Tessl Logo

hypothesis-testing

Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.

87

Quality

84%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Discovery

92%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong skill description that clearly articulates what the skill does (hypothesis-driven debugging) and when to use it (with natural user trigger phrases). The concrete actions are well-enumerated and the 'Use when' clause is explicit and comprehensive. The main weakness is that the trigger terms are very general debugging terms that could conflict with other coding/debugging skills, though the scientific method framing provides some differentiation.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: 'isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.' Also describes the methodology (scientific method to debugging).

3 / 3

Completeness

Clearly answers both 'what' (applies scientific method to debugging, isolating components, forming hypotheses, analyzing errors, etc.) and 'when' with an explicit 'Use when...' clause listing multiple trigger scenarios.

3 / 3

Trigger Term Quality

Excellent coverage of natural user language: 'code isn't working', 'getting an error', 'something broke', 'troubleshoot a bug', 'figure out what's causing an issue'. These are phrases users would naturally say when encountering bugs.

3 / 3

Distinctiveness Conflict Risk

While the scientific method / hypothesis-driven approach is a distinctive angle, the trigger terms ('code isn't working', 'getting an error', 'troubleshoot a bug') are very broad and could easily overlap with any general coding assistance or debugging skill. The niche is debugging methodology rather than a specific domain, which increases conflict risk.

2 / 3

Total

11

/

12

Passed

Implementation

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a well-structured debugging methodology skill with clear sequential workflow and concrete code examples. Its main weakness is length — the extended connection pool example, while illustrative, makes the document verbose for what could be a more concise methodological guide. The content is highly actionable and the workflow is clear with good feedback loops, but the inline bulk could benefit from progressive disclosure into separate files.

Suggestions

Move the hypothesis tracking template and testing techniques sections into separate referenced files (e.g., TRACKING_TEMPLATE.md, TESTING_TECHNIQUES.md) to reduce the main skill's token footprint.

Trim the connection pool running example — one compact example showing the full observe→hypothesize→predict→test→analyze cycle would suffice instead of the extended multi-step version.

DimensionReasoningScore

Conciseness

The skill is reasonably well-structured but includes some unnecessary verbosity. The extended examples (connection pool scenario) are illustrative but lengthy. The hypothesis tracking template is a full markdown template that adds bulk. Some sections like the decision tree in ASCII art could be more compact.

2 / 3

Actionability

The skill provides concrete, executable code examples for timing, data, and state hypothesis testing. The test plan examples are specific and copy-paste ready. The observation format, hypothesis tracking template, and decision tree all give Claude clear, actionable patterns to follow.

3 / 3

Workflow Clarity

The 5-step scientific debugging method (Observe → Hypothesize → Predict → Test → Analyze) is clearly sequenced with explicit validation at each stage. The decision tree provides a feedback loop for inconclusive results and rejected hypotheses. The predict step serves as a built-in verification checkpoint before and after testing.

3 / 3

Progressive Disclosure

The skill references other skills at the bottom (root-cause-analysis, trace-and-isolate, red-green-refactor) which is good, but the main content is a monolithic document. The hypothesis tracking template and testing techniques by type could be split into separate referenced files to keep the main skill leaner.

2 / 3

Total

10

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Repository
rohitg00/skillkit
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.