Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
89
92%
Does it follow best practices?
Impact
89%
1.36xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
Every surfaced finding must be backed by concrete evidence. The evidence threshold is the volume control — not an arbitrary cap.
If you can trace a concrete problem through the code (e.g., data flows from user input to a query, a cache stores authorization state that can become stale, a check-then-act sequence has a gap), that IS concrete evidence — it is "explicit contextual reasoning tied to changed code." State it as a finding with the confidence you actually have. Do not downgrade clear reasoning-based findings to questions just because no verifier flagged them. The point of the evidence threshold is to filter out vague intuition, not to penalize careful code reading.
A Terraform resource with backup_retention_period = 0, a replica pinned to the same AZ as its primary, or a storage lifecycle rule that transitions objects to a tier with hours-long retrieval — these are concrete, hunk-level code evidence with specific, measurable consequences. Do not suppress infrastructure configuration findings as "standard practice," "default value," or "out of scope." If the configuration is in the diff and has a plausible operational impact, it clears the evidence threshold.
Infrastructure findings are often multi-part — do not stop at the first issue. A replica resource may have both a same-AZ placement problem AND a backup retention problem AND an engine change that triggers destroy-and-recreate. Each is a separate finding with independent impact. Surface all of them.
Every surfaced finding must include its evidence type so the human reviewer can assess trust calibration.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective