Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.
Install with Tessl CLI
npx tessl i github:rohitg00/skillkit --skill hypothesis-testing90
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
You are applying the scientific method to debugging. Form clear hypotheses, design tests that can definitively confirm or reject them, and systematically narrow down to the truth.
Every debugging action should test a specific hypothesis. Random changes are not debugging.
Before forming hypotheses, collect observations:
Write down observations objectively:
Observations:
- API returns 500 error on POST /orders
- Happens only when cart has > 10 items
- Started after deployment on 2024-01-15
- Works fine in staging environment
- Error logs show "connection refused" to inventory serviceExamples (bad → good):
For each hypothesis, define what you expect to observe if it is true versus false:
Hypothesis: Connection pool exhausted for large orders
If TRUE:
- Active connections should hit max (20) during large orders
- Small orders should still work during this time
- Increasing pool size should fix the issue
If FALSE:
- Connection count stays well below max
- Small orders also fail during the issue
- Pool size change has no effectDesign tests that definitively confirm or reject:
Test Plan for Connection Pool Hypothesis:
1. Add connection pool monitoring
- Log active connections before/after each request
- Expected if true: Count reaches 20 during failures
2. Artificial stress test
- Send 5 large orders simultaneously
- Expected if true: Failures start when pool exhausted
3. Increase pool size to 50
- Repeat stress test
- Expected if true: Failures stop or threshold moves
4. Control test with small orders
- Send 20 small orders simultaneously
- Expected if true: No failures (faster processing)After testing:
Results:
- Connection count reached 20/20 during failures ✓
- Small orders succeeded during same period ✓
- Pool size increase to 50 → failures stopped ✓
Conclusion: Hypothesis CONFIRMED
Connection pool exhaustion is the proximate cause.
New question: Why do large orders exhaust the pool?
New hypothesis: Large orders make multiple inventory calls per item## Bug: [Description]
### Hypothesis 1: [Theory]
**Status:** Testing | Confirmed | Rejected
**Probability:** High | Medium | Low
**Evidence For:**
- [Evidence 1]
- [Evidence 2]
**Evidence Against:**
- [Evidence 1]
**Test Plan:**
1. [Test 1] - Expected result if true
2. [Test 2] - Expected result if false
**Test Results:**
- [Result 1]: [Supports/Contradicts]
- [Result 2]: [Supports/Contradicts]
**Conclusion:** [Confirmed/Rejected] because [reasoning]
---
### Hypothesis 2: [Next Theory]
...// Add timing instrumentation
const start = performance.now();
await suspectedSlowOperation();
const duration = performance.now() - start;
console.log(`Operation took ${duration}ms`);
// Hypothesis confirmed if duration > expected// Validate data at key points
function processWithValidation(data) {
console.assert(data.id != null, 'Missing id');
console.assert(data.items?.length > 0, 'Empty items');
console.assert(typeof data.total === 'number', 'Invalid total');
// If assertions fail, data hypothesis likely true
}// Snapshot state before and after
const stateBefore = JSON.stringify(currentState);
suspectedStateMutation();
const stateAfter = JSON.stringify(currentState);
if (stateBefore !== stateAfter) {
console.log('State changed:', diff(stateBefore, stateAfter));
}Is the hypothesis testable?
├── NO → Refine it to be more specific
└── YES → Can I test it without side effects?
├── NO → Design a safe test (staging, logs-only)
└── YES → Run the test
└── Results conclusive?
├── NO → Design a better test
└── YES → Hypothesis confirmed or rejected?
├── CONFIRMED → Root cause found?
│ ├── YES → Fix and verify
│ └── NO → Form next hypothesis (why?)
└── REJECTED → Form next hypothesis7c8ba72
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.