CtrlK
BlogDocsLog inGet started
Tessl Logo

sharaf/codebase-test-suite-audit

Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.

100

1.31x
Quality

100%

Does it follow best practices?

Impact

100%

1.31x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

100%

47%

Test Suite Quality Review: InvoiceParser

Weak oracle and assertionless test detection

Criteria
Without context
With context

Repo brief present

16%

100%

Evidence inventory table

0%

100%

Required report sections

20%

100%

Finding contract fields

40%

100%

Correct severity classification

25%

100%

Concrete file references

62%

100%

Identifies assertionless tests

100%

100%

Identifies self-referential oracle

100%

100%

Does not modify code

100%

100%

Coverage not treated as proof

50%

100%

TODOs not credited

100%

100%

Open evidence gaps listed

0%

100%

Remediation sequenced by risk

100%

100%

100%

6%

Test Suite Reliability Review: PaymentCore

LLM-generated test validity and spec drift audit

Criteria
Without context
With context

Hallucinated API flagged

100%

100%

Hallucinated attribute flagged

100%

100%

Implementation-copying oracle flagged

100%

100%

Weak assertion flagged

100%

100%

Financial severity correct

100%

100%

Assertionless tests NOT rated Critical

100%

100%

Spec drift addressed

100%

100%

No trust without evidence

62%

100%

Finding contract complete

70%

100%

No code modification

100%

100%

Remediation not broad rewrite

100%

100%

100%

18%

Test Suite Reliability Audit: ShipFast

Flaky CI signal and fixture realism audit

Criteria
Without context
With context

Repo brief and evidence inventory

30%

100%

Sleep based flakiness flagged

100%

100%

Shared mutable fixture risk flagged

100%

100%

Random fixture determinism flagged

100%

100%

Weak carrier oracle flagged

100%

100%

CI signal gaps listed

37%

100%

Finding contract complete

83%

100%

Severity proportional

50%

100%

Coverage not treated as proof

100%

100%

No code modification

100%

100%

Remediation sequenced by risk

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents