CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/codebase-test-suite-audit

Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.

100

1.31x
Quality

100%

Does it follow best practices?

Impact

100%

1.31x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-1/

Test Suite Quality Review: InvoiceParser

Problem/Feature Description

The engineering team at a small fintech startup has been shipping the invoiceparser library for about six months. The library reads supplier invoice PDFs and extracts structured data — vendor name, invoice number, date, line items with quantities and prices, and a running total. It is used upstream by an automated accounts-payable workflow that posts journal entries to the company's ledger.

A junior developer wrote the initial test suite quickly to meet a sprint deadline. The team has recently seen two production incidents where invoices were parsed incorrectly but the CI pipeline stayed green throughout. The tech lead suspects the tests may not be catching real regressions. Before the team invests in adding more tests, they want an honest picture of what the current suite actually protects against, where the gaps are, and which issues deserve attention first.

The codebase is under inputs/. The source module is inputs/invoiceparser/ and the tests live in inputs/tests/. Project configuration is in inputs/pyproject.toml.

Output Specification

Produce a file named audit_report.md in your working directory containing the full audit. The report must cover the test suite's ability to detect meaningful regressions, the quality of its assertions, and what evidence is missing. Include a prioritised list of remediation actions.

evals

scenario-1

criteria.json

task.md

README.md

tile.json