Uses failing test results as signals to guide bug search and narrow down candidate fault locations. Use when one or more tests are failing and the user wants to understand what's broken, when CI reports failures, or when triaging a batch of test failures after a change.
Install with Tessl CLI
npx tessl i github:santosomar/general-secure-coding-agent-skills --skill test-guided-bug-detector93
Quality
91%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
When tests fail, the failure set itself is a signal. One failure tells you where to look; the pattern across many failures tells you what kind of thing broke.
| Failure pattern | Most likely cause | First move |
|---|---|---|
| One test fails | Localized bug in the code that test covers | Read the assertion; → bug-localization |
| Many tests fail with the same error | Shared dependency broke (fixture, helper, import) | Find the shared thing — not the individual tests |
| Many tests fail with different errors | Environment/infra (DB down, fixture not loading) | Check setup/teardown logs, not test bodies |
| All tests in one file fail | Module-level import/fixture in that file | Check the file's top-level, not the tests |
| Tests fail only in CI, not locally | Env difference: version, path, timezone, locale, parallelism | Diff CI env vs local env, not the code |
| Tests fail only when run together | Test pollution — one test mutates shared state | Bisect the test order; find the polluter |
| Same tests intermittently fail | Flake — timing, network, randomness | Do NOT chase the code — stabilize the test |
The classic move: code executed by failing tests but not by passing tests is suspicious.
fail_hits / sqrt(total_fails × (fail_hits + pass_hits))This is mechanical but surprisingly effective. You need ≥3 failing and ≥3 passing tests for the signal to separate from noise.
Before debugging, group failures that share a root cause. Debugging 20 failures that are secretly 1 bug is 19× wasted effort.
Cluster by, in order:
Pick the largest cluster. Fix it. Re-run. Repeat.
Input: 47 tests failing after a merge.
Triage:
KeyError: 'tenant_id' → same error → one clustertest_billing.py → file-local → one clusterConnectionRefused → infra → ignore for nowCluster 1 (41 tests): All 41 use @with_authenticated_user fixture. Fixture source: creates a User dict. Grep the diff: tenant_id was added as a required field in User.__init__ but the fixture wasn't updated.
Root cause: One line in conftest.py. 41 failures → 1 bug.
Cluster 2 (5 tests): After fixing cluster 1, re-run. 3 of the 5 now pass (they were also blocked by the fixture). 2 remain. Both assert on a dollar amount that's off by exactly the tax rate. The merge also changed tax calculation.
47 → 2 root causes.
conftest.py/setup.py level, or the test DB didn't come up.True == True or similar tautology: The test itself is broken — pytest collected an accidentally-named non-test function, or someone committed a assert True # TODO placeholder.## Clusters
1. <N> failures — <shared root: exception/fixture/file>
Suspected fault: <file:line> (<how you narrowed it>)
2. ...
## Recommended order
Fix cluster <N> first (<reason: biggest / blocks others / fastest>)
## Quarantine
- <test name>: flaky, <mechanism> — do not chase47d56bb
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.