Review existing code, diffs, branches, or pull requests by spawning mandatory concern-specific reviewer subagents, then synthesize a ship-it / needs-review / blocked verdict.
92
97%
Does it follow best practices?
Impact
81%
1.22xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent still spawns the mandatory default reviewer gang for a tiny documentation-only change, avoids unnecessary conditional persona spam, adds the comments persona for doc-heavy risk, produces a valid verdict, and does not over-inflate minor findings into defects.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Default gang spawned",
"description": "The report uses the mandatory default personas: general, tests, and silent-failures",
"max_score": 12
},
{
"name": "General persona used",
"description": "The report includes the general reviewer persona (or equivalent broad code review lens)",
"max_score": 8
},
{
"name": "Comments persona used",
"description": "The report includes the comments reviewer persona (or equivalent documentation/docstring lens) — appropriate given docstring change",
"max_score": 10
},
{
"name": "Tests persona included",
"description": "The report applies the tests reviewer persona as part of the mandatory default gang, while keeping any test-related conclusion brief for a doc-only change",
"max_score": 8
},
{
"name": "Verdict present",
"description": "The report contains exactly one verdict label: 'ship it', 'needs review', or 'blocked'",
"max_score": 10
},
{
"name": "Scope stated",
"description": "The report names the reviewed scope only when needed to disambiguate the review; it does not add a noisy scope line after clear findings",
"max_score": 8
},
{
"name": "Personas listed in output",
"description": "The report shows spawned personas in compact metadata or evidence; it does not force persona details into the verdict footer",
"max_score": 8
},
{
"name": "No nit inflation",
"description": "The report does NOT flag the parameter rename (amount → value) as a defect or high-severity finding — the JSDoc now correctly matches the actual parameter name",
"max_score": 10
},
{
"name": "Docstring accuracy noted",
"description": "The report identifies that the @param name was corrected from 'amount' to 'value' to match the actual function signature — and treats this as an accuracy improvement, not a bug",
"max_score": 8
},
{
"name": "Unverified areas or residual risk",
"description": "The report mentions any residual risk or unverified surfaces, OR explicitly states there are none (either outcome is acceptable — silence is not)",
"max_score": 8
},
{
"name": "Recommended follow-up",
"description": "The report includes a recommended follow-up action such as implementation, runtime verification, readiness setup, documentation cleanup, or none",
"max_score": 10
},
{
"name": "Compact verdict block",
"description": "The report stays concise: findings, if any, are not repeated in the verdict footer, and the final metadata uses no more than 4 labeled lines with no scope/persona noise unless needed",
"max_score": 8
}
]
}