CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/codebase-test-suite-audit

Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.

100

1.31x
Quality

100%

Does it follow best practices?

Impact

100%

1.31x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

task.mdevals/scenario-2/

Test Suite Reliability Review: PaymentCore

Problem/Feature Description

Your team has been using an AI coding assistant to accelerate development of paymentcore, an internal Python library that handles payment charges, refunds, and batch processing for the company's e-commerce backend. The assistant helped write both the production code and the accompanying test suite. The library is now used by three downstream services that post ledger entries for every transaction.

A recent code review raised concerns: a senior engineer noticed that some of the tests look plausible on the surface but may not actually catch regressions. Two edge cases slipped to production in the last quarter without any CI signal. Before the team invests in expanding the test suite or wires it into a stricter release gate, they want an honest picture of what the current tests actually protect against.

The codebase is under inputs/. The source modules are in inputs/payment/, the tests are in inputs/tests/, and project configuration is in inputs/pyproject.toml.

Output Specification

Produce a file named audit_report.md in your working directory containing the full audit. The report must cover the reliability of the existing tests as regression protection, the quality of assertions, and any structural issues that would prevent the test suite from catching real faults. Include a prioritized remediation plan and list any evidence that was unavailable for the review.

evals

README.md

tile.json