diagnose

Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.

0.97x

Quality

100%

Does it follow best practices?

Impact

90%

0.97x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Evaluation results

100%

API Rate Limiter Letting Through Too Many Requests

Instrumentation discipline and regression test seam

Criteria

Baseline

With context

At least 3 hypotheses

100%

Falsifiable predictions

100%

Targeted instrumentation

100%

One variable at a time

100%

No 'log everything' approach

100%

Regression test added

100%

Seam assessment documented

100%

Root cause in post-mortem

100%

Architectural observation

100%

Debug instrumentation removed

100%

Bug fix present

100%

72%

-6%

Disappearing Cart Items After Login

Feedback loop construction and hypothesis ranking

Criteria

Baseline

With context

Feedback loop created

83%

91%

Loop exit codes correct

75%

50%

Multiple hypotheses listed

100%

Falsifiable hypothesis format

80%

100%

Hypotheses ranked

100%

Debug log prefix used

Debug logs removed

80%

100%

Regression test before fix

50%

25%

Regression test added

100%

41%

Correct hypothesis documented

100%

No eval/simulation language

100%

Slow Database Query After Schema Migration

Performance regression: measure first then bisect

Criteria

Baseline

With context

Baseline measurement recorded

100%

Measure before fix

100%

Query plan or profiler used

100%

At least 3 hypotheses

100%

Falsifiable perf predictions

100%

Hypotheses ranked

100%

Fix applied

100%

Benchmark script present

100%

Root cause in post-mortem

100%

Architectural recommendation post-fix

100%

No debug logs in final code

100%

Repository: coder/agent-tty
Commit: fae02cb

Evaluated: about 1 month ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Disappearing Cart Items After Login API Rate Limiter Letting Through Too Many Requests Slow Database Query After Schema Migration

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.