Reviews repositories, pull requests, diffs, and agent-generated code for reward hacking, fake completion, defensive theater, architectural bypasses, weakened guarantees, hidden fallbacks, and misleading abstractions.
98
97%
Does it follow best practices?
Impact
100%
1.09xAverage score across 6 eval scenarios
Passed
No known issues
Use this model to rank implementation-integrity findings. Severity measures the impact of the integrity failure. Confidence measures how strongly the evidence supports the claim.
Use Critical when the change creates false confidence around a core contract or safety property.
Examples:
Use High when the implementation materially breaks promised behavior or hides important failure, but the blast radius is narrower than Critical.
Examples:
Use Medium when the issue creates meaningful maintenance or correctness risk but does not currently prove broad user-facing failure.
Examples:
Use Low for localized integrity concerns with limited impact and clear containment.
Examples:
Do not report purely stylistic, formatting, naming, or generic lint concerns as implementation-integrity findings.
Use High confidence when the code, tests, and contract all line up:
Use Medium confidence when the evidence is strong but one part needs verification, such as runtime reachability, deployment configuration, or an external dependency behavior.
Use Low confidence for plausible leads that need confirmation and still carry enough risk to mention. Avoid Low-confidence findings unless the potential impact is High or Critical.
Report a finding when:
If those conditions are not met, either omit the issue or list it as a review limit/open question rather than a finding.
Order findings by:
Within the same severity, lead with issues that mislead users, operators, tests, or reviewers before issues that mainly create future maintenance risk.