Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
87
92%
Does it follow best practices?
Impact
87%
1.31xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
Risk classified green
80%
100%
No false positive findings
70%
100%
Detects oversized PR
0%
100%
Recommends splitting
0%
100%
Notes WIP status
100%
100%
Risk classified green
70%
100%
No false positive findings
100%
100%
Risk classified green
0%
100%
No false positive findings
40%
70%
Minimal review overhead
0%
60%
Risk classified green
0%
100%
No false positive findings
0%
0%
Risk classified green
100%
100%
No false positive findings
100%
100%
Risk classified green
90%
100%
No false positive findings
100%
100%
Detects compilation failure
100%
100%
Detects test failure
100%
12%
Catches IDOR vulnerability
26%
100%
Distinguishes UI hiding from real authorization
12%
100%
Catches same-AZ replica
0%
100%
Catches missing replica backups
50%
100%
Detects oversized PR
75%
100%
Recommends splitting
100%
100%
Flags missing description
100%
100%
Escalates due to auth changes
100%
100%
Catches silent error swallowing
100%
100%
Risk classified green or yellow
0%
0%
No false positive on every-to-some change
0%
0%
Minimal review overhead
20%
40%
Catches vulnerable dependencies
100%
100%
Names specific packages
100%
100%
Catches data race on shared counters
100%
100%
Catches cross-batch result leaking
25%
25%
Risk classified red
50%
70%
Detects AI authorship
0%
0%
Catches removed error handling
100%
100%
Catches removed context propagation
100%
100%
Risk classified red
100%
100%
Catches shutdown ordering bug
100%
53%
Risk classified yellow or higher
100%
100%
Catches stale authorization cache
0%
0%
Risk classified yellow or higher
0%
0%
Catches unsanitized header propagation
100%
100%
Catches response header echo risk
80%
30%
Risk classified yellow or higher
100%
100%
Catches health check pool contention
33%
80%
Risk classified yellow or higher
70%
100%
Catches dangerous resource reduction
100%
100%
Identifies cascading restart risk
25%
50%
Risk classified yellow or higher
100%
100%
Catches destroy-and-recreate risk
100%
100%
Catches removed safety guards
100%
20%
Catches apply_immediately risk
100%
100%
Risk classified red
70%
100%
Risk classified red
100%
100%
Catches open-to-world security groups
100%
100%
Catches database exposed to internet
100%
100%
Catches unencrypted notification endpoint
100%
100%
Catches overly permissive SNS policy
100%
100%
Risk classified red or yellow
100%
100%
Catches Glacier retrieval impact
80%
100%
Risk classified yellow or higher
100%
100%
Catches TOCTOU race on discount usage
100%
100%
Catches negative charge amount
100%
100%
Catches decrement-before-charge ordering
100%
0%
Risk classified red
0%
100%
Catches session never-expire risk
100%
100%
Catches unbounded Redis memory growth
100%
100%
Risk classified yellow or higher
100%
100%
Catches default provider mismatch
0%
100%
Identifies total payment outage impact
0%
100%
Risk classified red
0%
0%
Catches non-atomic rate limit check
100%
100%
Identifies security impact on brute force protection
100%
100%
Risk classified red
100%
100%
Catches hardcoded secrets
100%
100%
Detects AI authorship
0%
100%
Risk classified red
100%
100%
Catches timing attack vulnerability
100%
100%
Risk classified yellow or higher
100%
100%
Does not raise irrelevant findings
60%
100%
Catches TOCTOU race condition
100%
100%
Risk classified yellow or higher
100%
100%
Catches information disclosure
100%
100%
Risk classified yellow or higher
60%
100%
Catches non-transactional refund risk
100%
100%
Risk classified yellow or higher
0%
100%
Catches in-memory dedup limitation
66%
100%
Risk classified yellow or higher
0%
100%
Catches sort direction injection
100%
100%
Risk classified yellow or higher
100%
100%
Catches stale rate data
100%
100%
Risk classified yellow or higher
100%
100%
Catches 401 silently resolved as success
100%
100%
Catches removed auth redirect behavior
100%
100%
Risk classified red
0%
100%
Catches token storage security downgrade
100%
100%
Catches refresh token exposure
100%
100%
Risk classified red
60%
100%
Catches CSV injection
0%
100%
Risk classified yellow or higher
0%
100%
Catches uncapped backoff
100%
33%
Risk classified yellow or higher
0%
100%
Catches stale role in cache
0%
100%
Risk classified yellow or higher
0%
100%
Catches unsafe localStorage parsing
33%
53%
Risk classified yellow or higher
0%
100%
Table of Contents