Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.
92
97%
Does it follow best practices?
Impact
81%
1.22xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent spawns the mandatory default reviewer gang for a refactor that touches types, error handling, dead code, and auth, adds relevant conditional personas, avoids unnecessary extra personas, loads the repo's AGENTS.md, and produces a well-structured output with verdict and evidence without a noisy footer.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Default gang spawned",
"description": "The report mentions spawning all three default reviewer personas: general, tests, and silent-failures",
"max_score": 8
},
{
"name": "Types persona included",
"description": "The report mentions using the types reviewer persona (or equivalent type-safety lens) — appropriate given the new AuthToken/runtime schema introduction",
"max_score": 10
},
{
"name": "Cleanup persona included",
"description": "The report mentions using the cleanup reviewer persona (or equivalent dead code lens) — appropriate given middleware.old.ts is a dead file",
"max_score": 8
},
{
"name": "Comments persona omitted",
"description": "The report does NOT mention using a comments reviewer persona — no substantial comment changes justify it",
"max_score": 6
},
{
"name": "Repo guidance loaded",
"description": "The report references or incorporates at least one rule from AGENTS.md (e.g. Valibot preference, logging prohibition, 401 JSON shape, integration test requirement)",
"max_score": 10
},
{
"name": "Personas listed in output",
"description": "The review report shows spawned personas in compact metadata or evidence, but does not force persona details into the verdict footer",
"max_score": 6
},
{
"name": "Verdict present",
"description": "The review report contains exactly one of: 'ship it', 'needs review', or 'blocked' as a verdict",
"max_score": 8
},
{
"name": "Scope stated",
"description": "The report names the reviewed scope only when needed to disambiguate the review; it does not add a noisy scope line after clear findings",
"max_score": 6
},
{
"name": "Findings ordered by severity",
"description": "When multiple findings are present, they are listed with severity labels (e.g. high/medium/low) or are otherwise ranked by risk level",
"max_score": 8
},
{
"name": "Evidence for findings",
"description": "At least one finding includes a specific file reference (with or without line number) rather than a vague description",
"max_score": 10
},
{
"name": "Silent failures finding",
"description": "The report identifies the silent error swallowing in middleware.old.ts (catch blocks that call next() without error) as a concern — OR notes this file is dead/unused",
"max_score": 8
},
{
"name": "Unverified areas acknowledged",
"description": "The report includes a section or statement about unverified areas, readiness gaps, or surfaces that could not be confirmed",
"max_score": 6
},
{
"name": "Recommended follow-up",
"description": "The report includes a recommended follow-up action such as implementation, runtime verification, readiness setup, documentation cleanup, or none",
"max_score": 6
},
{
"name": "Compact verdict block",
"description": "The report keeps verdict metadata compact: no more than 4 labeled lines after findings, no repeated finding details, no scope/persona noise unless needed, and no prose-heavy recap",
"max_score": 6
}
]
}