Use when the user wants a test suite audit, test quality or reliability review, regression-protection review, unit/integration/e2e test review, coverage or CI signal assessment, flaky CI investigation, fixture-realism review, spec-drift review, or generated-test validation for AI/LLM/agent-written code. Produces severity-ranked findings for weak assertions, oracle gaps, brittle fixtures, over-mocking, CI trust, and generated-code test risks.
100
100%
Does it follow best practices?
Impact
100%
1.31xAverage score across 3 eval scenarios
Passed
No known issues
Weak oracle and assertionless test detection
Repo brief present
16%
100%
Evidence inventory table
0%
100%
Required report sections
20%
100%
Finding contract fields
40%
100%
Correct severity classification
25%
100%
Concrete file references
62%
100%
Identifies assertionless tests
100%
100%
Identifies self-referential oracle
100%
100%
Does not modify code
100%
100%
Coverage not treated as proof
50%
100%
TODOs not credited
100%
100%
Open evidence gaps listed
0%
100%
Remediation sequenced by risk
100%
100%
LLM-generated test validity and spec drift audit
Hallucinated API flagged
100%
100%
Hallucinated attribute flagged
100%
100%
Implementation-copying oracle flagged
100%
100%
Weak assertion flagged
100%
100%
Financial severity correct
100%
100%
Assertionless tests NOT rated Critical
100%
100%
Spec drift addressed
100%
100%
No trust without evidence
62%
100%
Finding contract complete
70%
100%
No code modification
100%
100%
Remediation not broad rewrite
100%
100%
Flaky CI signal and fixture realism audit
Repo brief and evidence inventory
30%
100%
Sleep based flakiness flagged
100%
100%
Shared mutable fixture risk flagged
100%
100%
Random fixture determinism flagged
100%
100%
Weak carrier oracle flagged
100%
100%
CI signal gaps listed
37%
100%
Finding contract complete
83%
100%
Severity proportional
50%
100%
Coverage not treated as proof
100%
100%
No code modification
100%
100%
Remediation sequenced by risk
100%
100%