Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.
92
97%
Does it follow best practices?
Impact
81%
1.22xAverage score across 4 eval scenarios
Passed
No known issues
any type flagged
100%
100%
Unsafe cast flagged
0%
100%
Non-null assertion flagged
100%
100%
Dead code identified
100%
100%
Duplicate logic identified
100%
100%
Catch-all error flagged
100%
100%
Error classification recommended
100%
100%
Narrating comments flagged
100%
100%
Findings tied to impact
100%
100%
Valid verdict
0%
100%
Compact verdict block
62%
100%
CLAUDE.md loaded
100%
100%
File references in findings
100%
100%
Line-level evidence
100%
100%
Verdict present
37%
100%
Findings by severity
100%
100%
Error classification finding
100%
100%
Silent failure finding
62%
100%
Mock-heavy test concern
16%
83%
Dead code identified
100%
100%
Unverified surfaces marked
0%
100%
Recommended follow-up
100%
100%
No nit inflation
100%
100%
Reviewer gang listed
0%
0%
Compact verdict block
50%
100%
Default gang spawned
0%
0%
General persona used
50%
37%
Comments persona used
60%
50%
Tests persona included
37%
50%
Verdict present
50%
40%
Scope stated
75%
75%
Personas listed in output
0%
0%
No nit inflation
100%
100%
Docstring accuracy noted
100%
100%
Unverified areas or residual risk
75%
100%
Recommended follow-up
30%
40%
Compact verdict block
75%
62%
Default gang spawned
0%
50%
Types persona included
60%
80%
Cleanup persona included
62%
75%
Comments persona omitted
100%
100%
Repo guidance loaded
100%
100%
Personas listed in output
0%
0%
Verdict present
62%
100%
Scope stated
83%
83%
Findings ordered by severity
100%
100%
Evidence for findings
100%
100%
Silent failures finding
87%
100%
Unverified areas acknowledged
16%
100%
Recommended follow-up
100%
100%
Compact verdict block
50%
83%