Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.
85
89%
Does it follow best practices?
Impact
77%
1.13xAverage score across 4 eval scenarios
Passed
No known issues
any type flagged
100%
100%
Unsafe cast flagged
30%
60%
Non-null assertion flagged
100%
100%
Dead code identified
100%
100%
Duplicate logic identified
100%
100%
Catch-all error flagged
100%
100%
Error classification recommended
60%
100%
Narrating comments flagged
100%
0%
Findings tied to impact
100%
100%
Valid verdict
80%
100%
Compact verdict block
50%
50%
CLAUDE.md loaded
100%
100%
File references in findings
100%
100%
Line-level evidence
100%
100%
Verdict present
50%
100%
Findings by severity
100%
100%
Error classification finding
100%
100%
Silent failure finding
100%
100%
Mock-heavy test concern
33%
33%
Dead code identified
100%
100%
Unverified surfaces marked
12%
100%
Recommended follow-up
100%
100%
No nit inflation
100%
100%
Reviewer gang listed
0%
0%
Compact verdict block
62%
37%
Default gang spawned
0%
0%
General persona used
37%
62%
Comments persona used
40%
70%
Tests persona included
25%
50%
Code-shape persona included
0%
25%
Verdict present
50%
60%
Scope stated
75%
87%
Personas listed in output
0%
0%
No nit inflation
100%
100%
Docstring accuracy noted
100%
100%
Unverified areas or residual risk
25%
37%
Recommended follow-up
50%
50%
Compact verdict block
50%
25%
Default gang spawned
0%
100%
Subagent authorization honored
0%
75%
Types persona included
80%
70%
Cleanup persona included
75%
87%
Comments persona omitted
100%
100%
Repo guidance loaded
100%
100%
Personas listed in output
0%
100%
Verdict present
100%
100%
Scope stated
66%
83%
Findings ordered by severity
100%
100%
Evidence for findings
100%
100%
Silent failures finding
100%
100%
Unverified areas acknowledged
66%
100%
Recommended follow-up
100%
100%
Compact verdict block
50%
66%