Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.
87
90%
Does it follow best practices?
Impact
81%
1.19xAverage score across 4 eval scenarios
Passed
No known issues
any type flagged
100%
100%
Unsafe cast flagged
30%
100%
Non-null assertion flagged
100%
100%
Dead code identified
100%
100%
Duplicate logic identified
100%
100%
Catch-all error flagged
100%
100%
Error classification recommended
60%
100%
Narrating comments flagged
100%
100%
Findings tied to impact
100%
100%
Valid verdict
80%
100%
Compact verdict block
50%
75%
CLAUDE.md loaded
100%
100%
File references in findings
100%
100%
Line-level evidence
100%
100%
Verdict present
50%
100%
Findings by severity
100%
100%
Error classification finding
100%
100%
Silent failure finding
100%
100%
Mock-heavy test concern
33%
100%
Dead code identified
100%
100%
Unverified surfaces marked
12%
100%
Recommended follow-up
100%
100%
No nit inflation
100%
100%
Reviewer gang listed
0%
100%
Compact verdict block
62%
75%
Default gang spawned
0%
0%
General persona used
37%
25%
Comments persona used
40%
30%
Tests persona included
25%
25%
Code-shape persona included
0%
0%
Verdict present
50%
0%
Scope stated
75%
75%
Personas listed in output
0%
0%
No nit inflation
100%
100%
Docstring accuracy noted
100%
100%
Unverified areas or residual risk
25%
50%
Recommended follow-up
50%
0%
Compact verdict block
50%
62%
Default gang spawned
0%
100%
Subagent authorization honored
0%
87%
Types persona included
80%
100%
Cleanup persona included
75%
100%
Comments persona omitted
100%
100%
Repo guidance loaded
100%
100%
Personas listed in output
0%
100%
Verdict present
100%
100%
Scope stated
66%
100%
Findings ordered by severity
100%
100%
Evidence for findings
100%
100%
Silent failures finding
100%
100%
Unverified areas acknowledged
66%
100%
Recommended follow-up
100%
100%
Compact verdict block
50%
50%