Run structured Codex/Claude autoreview closeout: choose the target, collect schema-validated findings, and rerun tests plus review until clean.
84
89%
Does it follow best practices?
Impact
74%
1.08xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether autoreview applies release-branch freeze discipline: fix only release blockers, reject non-blocking cleanup, rerun focused proof, and capture forward-port/follow-up decisions.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Release freeze recognized",
"description": "The closeout states that the release branch only allows blockers such as crashes, data loss, install/upgrade breakage, or concrete security exposure",
"max_score": 16
},
{
"name": "Crash finding accepted",
"description": "Finding A is accepted as release-blocking because it preserves the startup crash under a missing config file",
"max_score": 18
},
{
"name": "Style finding deferred",
"description": "Finding B is deferred to main/follow-up rather than fixed on the release branch",
"max_score": 14
},
{
"name": "Smallest allowed fix",
"description": "The fix plan is limited to the missing-file crash path and avoids broad startup refactors",
"max_score": 12
},
{
"name": "Focused release proof",
"description": "Verification includes a focused missing-config startup test or runtime proof tied to the release target",
"max_score": 14
},
{
"name": "Autoreview rerun after fix",
"description": "The closeout requires rerunning autoreview after the crash fix and reporting the clean result or conscious rejection",
"max_score": 10
},
{
"name": "Forward-port decision",
"description": "The response states whether the crash fix must forward-port to main and where the style cleanup should live",
"max_score": 8
},
{
"name": "No branch broadening",
"description": "The response does not introduce new product behavior, process policy, or style-only cleanup on the release branch",
"max_score": 8
}
]
}