Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
90
88%
Does it follow best practices?
Impact
90%
0.97xAverage score across 3 eval scenarios
Passed
No known issues
Feedback loop construction and hypothesis ranking
Feedback loop created
83%
91%
Loop exit codes correct
75%
50%
Multiple hypotheses listed
100%
100%
Falsifiable hypothesis format
80%
100%
Hypotheses ranked
100%
100%
Debug log prefix used
0%
0%
Debug logs removed
80%
100%
Regression test before fix
50%
25%
Regression test added
100%
41%
Correct hypothesis documented
100%
100%
No eval/simulation language
100%
100%
Instrumentation discipline and regression test seam
At least 3 hypotheses
100%
100%
Falsifiable predictions
100%
100%
Targeted instrumentation
100%
100%
One variable at a time
100%
100%
No 'log everything' approach
100%
100%
Regression test added
100%
100%
Seam assessment documented
100%
100%
Root cause in post-mortem
100%
100%
Architectural observation
100%
100%
Debug instrumentation removed
100%
100%
Bug fix present
100%
100%
Performance regression: measure first then bisect
Baseline measurement recorded
100%
100%
Measure before fix
100%
100%
Query plan or profiler used
100%
100%
At least 3 hypotheses
100%
100%
Falsifiable perf predictions
100%
100%
Hypotheses ranked
100%
100%
Fix applied
100%
100%
Benchmark script present
100%
100%
Root cause in post-mortem
100%
100%
Architectural recommendation post-fix
100%
100%
No debug logs in final code
100%
100%
a05d4e5
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.