Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.
100
100%
Does it follow best practices?
Impact
100%
1.31xAverage score across 3 eval scenarios
Passed
No known issues
The shipfast library schedules warehouse shipment batches and dispatches
orders to carrier APIs. It is used by an operations backend where duplicate
dispatches, missed shipments, or incorrect carrier payloads can create customer
support incidents.
The team says the tests are "mostly green" locally, but CI failures are often rerun and the nightly job sometimes flakes without a clear owner. Before the team tightens release gates, they want an audit of whether the current tests provide reliable regression signal or mostly exercise happy paths.
The codebase is under inputs/. Source modules are in inputs/shipfast/,
tests are in inputs/tests/, and project configuration is in
inputs/pyproject.toml.
Produce a file named audit_report.md in your working directory containing the
full audit. The report must assess flakiness and determinism, fixture realism,
CI signal quality, assertion/oracle strength, and missing evidence. Include a
prioritized remediation plan with verification steps.