Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.
100
100%
Does it follow best practices?
Impact
100%
1.31xAverage score across 3 eval scenarios
Passed
No known issues
The engineering team at a small fintech startup has been shipping the invoiceparser library for about six months. The library reads supplier invoice PDFs and extracts structured data — vendor name, invoice number, date, line items with quantities and prices, and a running total. It is used upstream by an automated accounts-payable workflow that posts journal entries to the company's ledger.
A junior developer wrote the initial test suite quickly to meet a sprint deadline. The team has recently seen two production incidents where invoices were parsed incorrectly but the CI pipeline stayed green throughout. The tech lead suspects the tests may not be catching real regressions. Before the team invests in adding more tests, they want an honest picture of what the current suite actually protects against, where the gaps are, and which issues deserve attention first.
The codebase is under inputs/. The source module is inputs/invoiceparser/ and the tests live in inputs/tests/. Project configuration is in inputs/pyproject.toml.
Produce a file named audit_report.md in your working directory containing the full audit. The report must cover the test suite's ability to detect meaningful regressions, the quality of its assertions, and what evidence is missing. Include a prioritised list of remediation actions.