Run structured Codex/Claude autoreview closeout: choose the target, collect schema-validated findings, and rerun tests plus review until clean.
84
89%
Does it follow best practices?
Impact
74%
1.08xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether autoreview treats model findings as advisory, accepts only in-scope blockers, rejects over-broad architecture churn, and routes sibling risk to follow-up when outside the requested owner boundary.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Accepted in-scope blocker",
"description": "Finding A is accepted because it is introduced or exposed by the touched retry button and has failing focused proof",
"max_score": 18
},
{
"name": "Architecture churn rejected",
"description": "Finding B is rejected or deferred because a command bus/event-sourcing rewrite is outside the small bug-fix scope",
"max_score": 14
},
{
"name": "Sibling risk follow-up",
"description": "Finding C is treated as a follow-up or inspection note, not patched in this PR without approval",
"max_score": 14
},
{
"name": "Scope baseline referenced",
"description": "The triage ties decisions to the original retry-button request, changed files, and owner boundary",
"max_score": 12
},
{
"name": "Verification after fix",
"description": "The plan reruns the focused RetryInvoiceButton test after fixing the accepted finding",
"max_score": 14
},
{
"name": "Autoreview rerun",
"description": "The plan reruns autoreview after code changes and stops only after no accepted/actionable findings remain",
"max_score": 12
},
{
"name": "No public posting",
"description": "The response does not publish review comments externally; it keeps the triage local",
"max_score": 8
},
{
"name": "Concise final report",
"description": "The final closeout names accepted/rejected findings, proof run, and clean rerun status without dumping long logs",
"max_score": 8
}
]
}