Review existing code, diffs, branches, or pull requests using concern-specific reviewer personas and evidence. Use when auditing someone else's work, triaging risk in a PR, or producing a ship-it / needs-review / blocked verdict. Do not use to verify your own completed change; use `verify` for that.
98
100%
Does it follow best practices?
Impact
96%
1.20xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent performs a thorough code-shape review on a TypeScript module, correctly identifying unsafe type escapes (any, as casts, non-null assertions), dead and duplicate code, catch-all error handling, and comments that only narrate the code — and whether the output follows the expected verdict and findings format.",
"type": "weighted_checklist",
"checklist": [
{
"name": "any type flagged",
"description": "Flags the use of 'any' type (in decodeToken and/or parseJwt parameters/return types) as a safety failure or type-safety concern",
"max_score": 10
},
{
"name": "Unsafe cast flagged",
"description": "Flags the unsafe 'as' cast in refreshToken (casting decoded payload to a specific shape without validation) as a safety concern",
"max_score": 10
},
{
"name": "Non-null assertion flagged",
"description": "Flags the non-null assertion operator (payload.userId!) in getUserId as a safety failure",
"max_score": 10
},
{
"name": "Dead code identified",
"description": "Identifies legacyValidate as dead/unused code that should be deleted",
"max_score": 10
},
{
"name": "Duplicate logic identified",
"description": "Identifies that decodeToken and parseJwt are functionally identical and flags this as duplication",
"max_score": 10
},
{
"name": "Catch-all error flagged",
"description": "Flags the generic catch(e) blocks in validateToken and/or hasPermission that swallow all errors without classification",
"max_score": 10
},
{
"name": "Error classification recommended",
"description": "Recommends distinguishing between error types (e.g. malformed token, expired token, missing field) rather than collapsing all failures to a single false return",
"max_score": 10
},
{
"name": "Narrating comments flagged",
"description": "Identifies at least two comments that merely restate what the code does (e.g. '// Check if the token exists', '// Decode the token', '// Return the user ID field') as commentary that should be removed",
"max_score": 10
},
{
"name": "Findings tied to impact",
"description": "At least one finding includes an explanation of what could break or who is affected (e.g. non-null assertion could throw at runtime in production, catch-all silently hides malformed token attacks)",
"max_score": 10
},
{
"name": "Valid verdict",
"description": "Report concludes with one of exactly three verdict values: 'ship it', 'needs review', or 'blocked' — no other phrasing",
"max_score": 10
},
{
"name": "Compact verdict block",
"description": "Report does not end with a prose-heavy recap: detailed findings appear once, and verdict metadata is limited to no more than 4 labeled lines with no scope/persona noise unless needed",
"max_score": 8
}
]
}