Review existing code, diffs, branches, or pull requests using concern-specific reviewer personas and evidence. Use when auditing someone else's work, triaging risk in a PR, or producing a ship-it / needs-review / blocked verdict. Do not use to verify your own completed change; use `verify` for that.
98
100%
Does it follow best practices?
Impact
96%
1.20xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly selects reviewer personas for a refactor that touches types, error handling, dead code, and auth. The agent should apply the default personas plus relevant conditional ones, avoid spawning unnecessary personas, load the repo's AGENTS.md, and produce a well-structured output with verdict and evidence without a noisy footer.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Default personas used",
"description": "The report mentions using at least two of the three default personas: general, tests, silent-failures",
"max_score": 8
},
{
"name": "Types persona included",
"description": "The report mentions using the types reviewer persona (or equivalent type-safety lens) — appropriate given the new AuthToken/zod schema introduction",
"max_score": 10
},
{
"name": "Cleanup persona included",
"description": "The report mentions using the cleanup reviewer persona (or equivalent dead code lens) — appropriate given middleware.old.ts is a dead file",
"max_score": 8
},
{
"name": "Comments persona omitted",
"description": "The report does NOT mention using a comments reviewer persona — no substantial comment changes justify it",
"max_score": 6
},
{
"name": "Repo guidance loaded",
"description": "The report references or incorporates at least one rule from AGENTS.md (e.g. zod requirement, logging prohibition, 401 JSON shape, integration test requirement)",
"max_score": 10
},
{
"name": "Personas listed in output",
"description": "The review report shows persona choice only when useful for the task or in compact metadata; it does not force persona details into the verdict footer",
"max_score": 6
},
{
"name": "Verdict present",
"description": "The review report contains exactly one of: 'ship it', 'needs review', or 'blocked' as a verdict",
"max_score": 8
},
{
"name": "Scope stated",
"description": "The report names the reviewed scope only when needed to disambiguate the review; it does not add a noisy scope line after clear findings",
"max_score": 6
},
{
"name": "Findings ordered by severity",
"description": "When multiple findings are present, they are listed with severity labels (e.g. high/medium/low) or are otherwise ranked by risk level",
"max_score": 8
},
{
"name": "Evidence for findings",
"description": "At least one finding includes a specific file reference (with or without line number) rather than a vague description",
"max_score": 10
},
{
"name": "Silent failures finding",
"description": "The report identifies the silent error swallowing in middleware.old.ts (catch blocks that call next() without error) as a concern — OR notes this file is dead/unused",
"max_score": 8
},
{
"name": "Unverified areas acknowledged",
"description": "The report includes a section or statement about unverified areas, readiness gaps, or surfaces that could not be confirmed",
"max_score": 6
},
{
"name": "Recommended follow-up",
"description": "The report includes a recommended follow-up action from the allowed options: implementation, verify, agent-readiness, or docs",
"max_score": 6
},
{
"name": "Compact verdict block",
"description": "The report keeps verdict metadata compact: no more than 4 labeled lines after findings, no repeated finding details, no scope/persona noise unless needed, and no prose-heavy recap",
"max_score": 6
}
]
}