CtrlK
BlogDocsLog inGet started
Tessl Logo

bug-reproduction-test-generator

Creates minimal, reproducible test cases from bug reports to confirm the defect before and after a fix. Use when a bug is reported without a failing test, when the user needs a regression test for a fix, or when the user asks to reproduce a bug as a test.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill bug-reproduction-test-generator
What are skills?

100

Quality

100%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

Bug Reproduction Test Generator

A bug you can't reliably reproduce is a bug you can't reliably fix. This skill turns prose ("it crashes when I upload a big file") into an executable test that is red on the buggy code and will be green once it's fixed.

Step 1 — Classify the input you have

Input availableFirst move
Stack traceTop project-code frame → target function; exception → assertion
"Steps to reproduce" proseTranslate each step to a setup line; last step → action
Log excerptFind the first anomalous line; grep for the producer
Failing production request (curl, HAR)Replay against a test harness; wrap in an assertion
Screenshot / "it looks wrong"Not machine-checkable — ask for the data, not the UI
"It's intermittent"Do not write a test yet — first stabilize (Step 4)

Step 2 — Identify the triple

Every reproduction test is (setup, action, assertion). Extract each:

  • Setup: the minimal state needed. A user? One user, not a seeded database. A file? The smallest file that triggers it.
  • Action: the single call/request/event that triggers the bug. If the bug needs two actions in sequence, both are the action.
  • Assertion: what should happen, not what currently happens. The test must describe correct behavior, because once fixed this test is your regression guard.

Step 3 — Write, then minimize

Write a test that reproduces. It will be too big. Minimize mechanically:

DimensionMinimization
Setup stateDelete one setup line → still red? Keep deleting. Add back when it goes green.
Input sizeBinary-bisect the input (half the file, half the list) until it goes green
DependenciesInline mocks; if removing a mock turns it green, that mock was load-bearing

The target is: a test so small that when it fails, the fault is obvious from reading the test alone.

Step 4 — Stabilize intermittents

If the bug is non-deterministic, the test can't simply assert correct behavior — it'll pass by luck half the time.

Source of flakinessStabilization
Timing / sleep-basedReplace wall-clock with an injected clock you advance manually
Thread interleavingForce the bad interleaving with a latch/barrier; assert on the state you now deterministically reach
NetworkMock the transport; inject the specific response/error that triggers the bug
Test orderRun the test in isolation; if it passes alone, the bug is test pollution, not product code
RandomnessSeed the RNG with the seed that reproduces (if logged); if not, loop 1000× and assert all pass

If you cannot stabilize, do not ship a test that retries. A retry loop in a reproduction test is an admission you don't understand the bug.

Worked example

Report: "Exporting a CSV with a quote in a column value produces a file Excel can't open."

Triple extraction:

  • Setup: one row with one field containing a double-quote
  • Action: call export_csv(rows)
  • Assertion: output field is RFC-4180 escaped (""" and field is quoted)

Minimal test:

def test_csv_export_escapes_quotes():
    rows = [{"name": 'say "hi"'}]
    out = export_csv(rows)
    assert out == 'name\r\n"say ""hi"""\r\n'

Three lines of meaning. When this fails, the reader immediately sees: quotes aren't escaped.

Edge cases

  • The bug is a hang, not a crash: Assertion is pytest.fail() after the call, with a timeout decorator. The test passes if it reaches fail() within the timeout.
  • The bug is a leak: Can't assert on memory from a unit test. Instead, assert that close()/__exit__ was called on the resource mock.
  • The correct behavior is undefined: Ask. A test must assert something; if nobody knows what "correct" is, you found a requirements bug — → requirement-enhancer.

Do not

  • Do not assert on the current (buggy) behavior. The test should describe what should happen and must currently fail.
  • Do not reproduce via the UI when you can reproduce via the API. UI tests are 10× slower and 10× flakier.
  • Do not add this test to the "flaky" quarantine. If it's flaky, you haven't finished reproducing.
  • Do not skip minimization because "the test works." A 50-line reproduction nobody can read is a 50-line liability.

Handoff

Once red: → bug-localization (test is the signal) → bug-to-patch-generator (test is the oracle).

Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.