Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
89
92%
Does it follow best practices?
Impact
89%
1.36xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
Merge, deduplicate, and rank all candidate findings into a compact set worth a human's attention.
fresh-eyes-review and optionally challenger-review have produced candidate findings, and before human-review-handofffresh-eyes-reviewchallenger-review (if it ran)Each finding is one JSON object. Example:
{
"source": "fresh-eyes-review",
"file": "auth/login.py",
"line_start": 42,
"line_end": 55,
"issue": "SQL injection via unsanitized user input in query construction",
"severity": 3,
"confidence": 3,
"evidence": "User-supplied `username` concatenated directly into SQL string at line 47.",
"verification_support": 3
}Merge all finding sources. Collect findings from all reviewers and verifiers into one list.
Deduplicate — apply the following strategies in order:
Apply these strategies directly. If the tile's dedupe_findings.py script is available, you may use it — but do not fail or stop if it is missing. See REFERENCE.md for detailed algorithm documentation.
Suppress weak findings:
Rank survivors by severity × confidence × verification_support (each 1–3).
| Finding | Severity | Confidence | Verification Support | Score |
|---|---|---|---|---|
| SQL injection in login handler | 3 | 3 | 3 | 27 |
| Unhandled null in parser | 2 | 3 | 2 | 12 |
Higher scores surface first; findings below the evidence-threshold score are suppressed.
Surface all findings that clear the evidence threshold. No arbitrary caps.
Handle missing sources. If a reviewer source returned malformed or empty output, skip that source, proceed with remaining sources, and note the missing source in the output metadata.
Each finding gains:
corroborated_by: list of agreeing sourcescontested_by: list of disagreeing sourcesmerged_confidence: high | medium | lowsuppressed: true/falsesuppression_reason: why it was suppressed (null if not suppressed)evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective