Review existing code, diffs, branches, or pull requests using concern-specific reviewer personas and evidence. Use when auditing someone else's work, triaging risk in a PR, or producing a ship-it / needs-review / blocked verdict. Do not use to verify your own completed change; use `verify` for that.
98
100%
Does it follow best practices?
Impact
92%
1.31xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly selects reviewer personas for a refactor that touches types, error handling, dead code, and auth. The agent should apply the default personas plus relevant conditional ones, avoid spawning unnecessary personas, load the repo's AGENTS.md, and produce a well-structured output with verdict, personas listed, and evidence.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Default personas used",
"description": "The report mentions using at least two of the three default personas: general, tests, silent-failures",
"max_score": 8
},
{
"name": "Types persona included",
"description": "The report mentions using the types reviewer persona (or equivalent type-safety lens) — appropriate given the new AuthToken/zod schema introduction",
"max_score": 10
},
{
"name": "Cleanup persona included",
"description": "The report mentions using the cleanup reviewer persona (or equivalent dead code lens) — appropriate given middleware.old.ts is a dead file",
"max_score": 8
},
{
"name": "Comments persona omitted",
"description": "The report does NOT mention using a comments reviewer persona — no substantial comment changes justify it",
"max_score": 6
},
{
"name": "Repo guidance loaded",
"description": "The report references or incorporates at least one rule from AGENTS.md (e.g. zod requirement, logging prohibition, 401 JSON shape, integration test requirement)",
"max_score": 10
},
{
"name": "Personas listed in output",
"description": "The review report explicitly lists the reviewer personas used in a dedicated section or field",
"max_score": 6
},
{
"name": "Verdict present",
"description": "The review report contains exactly one of: 'ship it', 'needs review', or 'blocked' as a verdict",
"max_score": 8
},
{
"name": "Scope stated",
"description": "The report names the scope reviewed (e.g. the PR, the auth middleware, the specific files)",
"max_score": 6
},
{
"name": "Findings ordered by severity",
"description": "When multiple findings are present, they are listed with severity labels (e.g. high/medium/low) or are otherwise ranked by risk level",
"max_score": 8
},
{
"name": "Evidence for findings",
"description": "At least one finding includes a specific file reference (with or without line number) rather than a vague description",
"max_score": 10
},
{
"name": "Silent failures finding",
"description": "The report identifies the silent error swallowing in middleware.old.ts (catch blocks that call next() without error) as a concern — OR notes this file is dead/unused",
"max_score": 8
},
{
"name": "Unverified areas acknowledged",
"description": "The report includes a section or statement about unverified areas, readiness gaps, or surfaces that could not be confirmed",
"max_score": 6
},
{
"name": "Recommended follow-up",
"description": "The report includes a recommended follow-up action from the allowed options: implementation, verify, agent-readiness, or docs",
"max_score": 6
}
]
}