CtrlK
BlogDocsLog inGet started
Tessl Logo

evilissimo/implementation-integrity-review

Reviews repositories, pull requests, diffs, and agent-generated code for reward hacking, fake completion, defensive theater, architectural bypasses, weakened guarantees, hidden fallbacks, and misleading abstractions.

98

1.09x
Quality

97%

Does it follow best practices?

Impact

100%

1.09x

Average score across 6 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Evaluation results

100%

Test-Gaming Integrity Review

Criteria
Without context
With context

Identifies reward hacking

100%

100%

Critical severity

100%

100%

Evidence

100%

100%

Explains false confidence

100%

100%

Affected files

100%

100%

Remediation

100%

100%

Avoids benign framing

100%

100%

100%

Fake Provider Abstraction Integrity Review

Criteria
Without context
With context

Correct category

100%

100%

Severity

100%

100%

Single implementation evidence

100%

100%

Hardcoded success evidence

100%

100%

No injection

100%

100%

Test weakness

100%

100%

Remediation

100%

100%

Avoids abstraction absolutism

100%

100%

100%

45%

Defensive Theater Integrity Review

Criteria
Without context
With context

Correct category

64%

100%

Severity

60%

100%

Broad exception evidence

100%

100%

Silent fallback evidence

25%

100%

Semantic mismatch

11%

100%

Test weakness

80%

100%

Remediation

57%

100%

No overreach

100%

100%

100%

Architecture Bypass Integrity Review

Criteria
Without context
With context

Correct category

100%

100%

Severity

100%

100%

Bypass evidence

100%

100%

Lost guarantees

100%

100%

Test weakness

100%

100%

Remediation

100%

100%

Evidence-backed

100%

100%

Failed

Legitimate Fallback Integrity Review

100%

Fake Completion Integrity Review

Criteria
Without context
With context

Leads with finding

100%

100%

Correct category

100%

100%

Severity

100%

100%

Evidence

100%

100%

Contract mismatch

100%

100%

Test weakness

100%

100%

Remediation

100%

100%

No lint noise

100%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents