Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
87
92%
Does it follow best practices?
Impact
87%
1.31xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
The risk lane determines which skills run, how findings are surfaced, and how much time the tile spends.
Use challenger review selectively based on risk lane or reviewer uncertainty. Never skip the evidence-build stage regardless of risk lane.
When the risk classifier's confidence is low, route to the next higher lane. False red is cheap. False green is dangerous.
If the tile is slower than these budgets, developers will skip it.
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective