investigate flaky, intermittent, non-reproducible, or ci-only python pytest failures by leading with retrace capture and deterministic replay. use when a user reports flaky pytest tests, random failures, tests that pass locally but fail in ci, async/threading/timing flakes, pytest-xdist issues, fixture leakage, monkeypatch leakage, test isolation failures, dependency/environment-sensitive failures, pytest timeouts, or ai-generated code that breaks tests intermittently. guide the agent to preserve the failed execution as a retrace trace, replay it, inspect runtime state, and use ordinary pytest/source/log checks only as supporting triage.
68
86%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Use this skill to investigate flaky, intermittent, non-reproducible, or CI-only pytest failures.
For a real flaky test, make Retrace the default path: capture the failed pytest
execution once, preserve the .retrace artifact, then replay the exact run and
debug the runtime state that actually happened. Source inspection, traceback
reading, and pytest reruns are still useful, but they are supporting steps.
They are not a substitute for the failed execution when timing, external calls,
thread scheduling, test order, or hidden state matters.
Start from the exact pytest command that failed in CI or locally. Preserve the same Python version, dependency lockfile, environment variables, pytest flags, and working directory. Narrow to a single test only when that does not change the failure shape.
If the environment has Retrace auto-enable installed, record pytest directly:
python -m pip install retracesoftware
python -m retracesoftware install
RETRACE_RECORDING=artifacts/failed-pytest.retrace python -m pytest path/to/test.py::test_name -vv -s --tb=long --maxfail=1For a full CI command or suite-level flake:
RETRACE_RECORDING=artifacts/failed-pytest.retrace python -m pytest tests/ -vv --maxfail=1Without the auto-enable hook, record explicitly:
python -m retracesoftware --recording artifacts/failed-pytest.retrace -- -m pytest path/to/test.py::test_name -vv -s --tb=long --maxfail=1For CI, add artifact preservation immediately. For GitHub Actions, the shape is:
- name: Run pytest under Retrace
run: |
mkdir -p artifacts
RETRACE_RECORDING=artifacts/failed-pytest.retrace python -m pytest path/to/test.py::test_name -vv -s --tb=long --maxfail=1
- name: Upload Retrace recording
if: always()
uses: actions/upload-artifact@v4
with:
name: failed-pytest-retrace
path: artifacts/*.retraceIf the failure is intermittent, keep the recording command in the retry loop or
CI rerun path until a failing .retrace file is produced. Do not stop after a
passing rerun and declare the issue explained.
Once a failing .retrace file exists, use the replay workflow available in the
repo or environment:
.retrace file in VS Code with the Retrace extension when using the
supported editor workflow.retrace artifact so the assertion,
traceback, and replayed runtime state can be comparedTerminal replay:
./artifacts/failed-pytest.retrace --extract
ROOT_PID=$(python -m retracesoftware --recording artifacts/failed-pytest.retrace --list_pids | head -1)
./artifacts/failed-pytest.d/${ROOT_PID}.binUse replay to answer questions that source/logs cannot answer reliably:
Collect or infer enough context to make the recording useful:
xdist, asyncio, timeout,
rerunfailures, flaky, randomly, or random-orderpytest.ini, pyproject.toml, tox.ini, setup.cfg,
and conftest.pyIf evidence is missing, proceed best-effort and mark it as unknown. Do not let missing metadata block recording the failing execution.
Classify the failure into one or more categories, using the replay when available:
For each relevant category, state the replay evidence or log/source evidence that supports it and what evidence is still missing.
Use cheap pytest checks to scope the failing command and compare behavior, but do not confuse a passing rerun with a diagnosis.
Single failing test:
pytest path/to/test.py::test_name -vv -s --tb=long --maxfail=1Without output capture:
pytest path/to/test.py::test_name -vv -s --capture=no --tb=longFile-level interaction:
pytest path/to/test.py -vv --maxfail=1Wider order interaction:
pytest tests/ -vv --maxfail=1For xdist-related failures, if pytest-xdist is already used, compare
parallel and non-parallel runs:
pytest path/to/test.py::test_name -vv -n0
pytest path/to/test.py::test_name -vv -n autoOnly suggest plugin-specific commands if the repo already uses the plugin or
the user agrees to add it. Do not assume plugins such as pytest-randomly or
pytest-random-order are installed.
pytest --random-order
pytest --randomly-seed=<seed>Inspect these areas, prioritizing what the replay shows:
conftest.pyLook especially for state that survives beyond the test that created it.
Return:
## Flaky pytest investigation
### Retrace capture
- Recording available:
- Recording path / artifact:
- Capture command:
- Replay status:
### Failure summary
- Test:
- Failure:
- Where observed:
- Reproducibility:
### Likely category
- Primary:
- Secondary:
- Confidence:
### Evidence
- From replay:
- From logs/source:
- Missing:
### Immediate next step
1.
2.
3.
### Code areas to inspect
-
### Likely fix direction
-If no failing Retrace recording exists yet, make the immediate next step a specific recording command or CI artifact change. Keep the report concise unless the user asks for deeper analysis.