Evidence-first pull request review with independent critique, selective challenger review, and human handoff.
87
93%
Does it follow best practices?
Impact
87%
1.31xAverage score across 43 eval scenarios
Risky
Do not use without reviewing
Build the final reviewer packet that a human reviewer actually reads.
After finding-synthesizer has produced the ranked finding set — always for medium/high-risk PRs, and for low-risk PRs when findings exist. Consumes synthesized findings from finding-synthesizer and the evidence pack from pr-evidence-builder.
The packet contains these sections (see the example output for exact formatting):
## PR Review Packet
### TL;DR
Adds OAuth2 token refresh logic to `auth/session.py`; risk is MEDIUM due to
token-expiry edge cases. One unresolved assumption about clock-skew tolerance
requires human input.
### Risk: MEDIUM
Contributing factors:
- Modifies authentication flow (high-impact surface)
- No new tests for the refresh-failure path
AI-assisted: yes (GitHub Copilot detected in commit metadata)
### Verification Status
| Verifier | Status | Notable Findings |
|-------------------|---------|-----------------------------------------|
| Static analysis | PASS | No new lint errors |
| Test coverage | WARN | refresh-failure branch uncovered (0%) |
| Dependency audit | PASS | No new vulnerable packages |
### Findings (2 items)
1. **Uncovered refresh-failure branch** — `auth/session.py:114`
Why it matters: Silent failure on token refresh could leave users with
expired sessions and no error feedback.
Evidence: Coverage report (pr-evidence-builder)
Suggested action: fix
2. **Clock-skew tolerance undocumented** — `auth/session.py:87`
Why it matters: Token expiry comparison assumes clocks are in sync; distributed
deployments may reject valid tokens prematurely.
Evidence: Code inspection (finding-synthesizer)
Suggested action: discuss
### Unresolved Assumptions
- What is the acceptable clock-skew window for this deployment environment?
### Recommended Review Focus
- `auth/session.py` lines 87–120: expiry logic and the uncovered failure branch
are the highest-risk hunks; verify behaviour under clock drift and network
timeout conditions.
### Metadata
- reviewer mode: fresh_eyes
- reviewer model family: claude-3
- authoring model family: unknown
- wall-clock time: 47 s
- context isolation: yes
---
*This is an evidence-based review aid, not an approval. The tile produces
findings and questions. Humans produce decisions.*Add the review-boundaries disclaimer. Every packet must include: "This is an evidence-based review aid, not an approval. The tile produces findings and questions. Humans produce decisions."
Mark escalation areas. If any human-escalation triggers are present, surface specific questions the human reviewer should answer.
Validate the packet before delivering it. Run through this checklist:
If any item is missing, regenerate or patch that section before handing off.
finding-synthesizer — produces the ranked finding set consumed by this skillpr-evidence-builder — produces the evidence pack consumed by this skillevals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
rules
skills
challenger-review
finding-synthesizer
fresh-eyes-review
human-review-handoff
pr-evidence-builder
review-retrospective