Run structured Codex/Claude autoreview closeout: choose the target, collect schema-validated findings, and rerun tests plus review until clean.
84
89%
Does it follow best practices?
Impact
74%
1.08xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether autoreview chooses the correct branch/PR target for a clean committed branch instead of running an empty dirty-local review, and whether it preserves the closeout proof loop.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Branch target selected",
"description": "The plan selects branch review against the PR base, such as `--mode branch --base origin/main`, rather than local/dirty mode",
"max_score": 18
},
{
"name": "PR base discovery command",
"description": "The commands include discovering the actual PR base with `gh pr view --json baseRefName --jq .baseRefName` or an equivalent reliable source",
"max_score": 12
},
{
"name": "Autoreview helper used",
"description": "The commands run the bundled autoreview helper directly rather than asking another model to perform a generic review",
"max_score": 14
},
{
"name": "No empty local review",
"description": "The plan explicitly avoids dirty-local/local mode because the working tree is clean and the committed branch diff is the real target",
"max_score": 14
},
{
"name": "No push requirement",
"description": "The plan does not push the branch solely to make autoreview possible",
"max_score": 10
},
{
"name": "Focused proof loop",
"description": "The plan says accepted findings require focused tests or proof plus a rerun of autoreview until no accepted/actionable findings remain",
"max_score": 14
},
{
"name": "Reject advisory weakness",
"description": "The plan treats review output as advisory and says findings must be verified before fixing",
"max_score": 10
},
{
"name": "Compact closeout shape",
"description": "The output is a concise plan with the requested sections and no verbose review transcript",
"max_score": 8
}
]
}