Drive a PR to merge: address review comments (human + Copilot), push fixes, wait for CI to go green, then squash-merge. Use when a human says "babysit PR #NNN", "address the comments and merge when green", or "get this PR landed". Pushes and merges — invoking it IS the authorization to do so. Refuses to merge on red CI, unresolved blocking reviews, or conflicts; escalates instead.
90
90%
Does it follow best practices?
Impact
94%
1.17xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly applies all merge preconditions (green CI, no requesting-changes reviews, no conflicts, up-to-date base), uses squash merge with Conventional Commit title, deletes head branches, recognises that Dependabot PRs require triage before babysitting, and flags the failed CI check as a blocker rather than ignoring it.",
"type": "weighted_checklist",
"checklist": [
{
"name": "PR 301 merge decision",
"description": "PR 301 is decided as 'merge' (all checks green, approved, clean, up to date)",
"max_score": 8
},
{
"name": "PR 301 squash strategy",
"description": "PR 301 merge entry specifies squash as the merge strategy (not merge commit or rebase)",
"max_score": 10
},
{
"name": "PR 301 Conventional Commit title",
"description": "PR 301 merge entry includes a commit title in Conventional Commit format (e.g. 'feat: ...' matching the PR type)",
"max_score": 8
},
{
"name": "PR 301 branch deletion",
"description": "PR 301 entry states the head branch should be deleted after merging",
"max_score": 6
},
{
"name": "PR 301 label linkage noted",
"description": "PR 301 entry acknowledges the ai:implement-spec label or SPEC-88 closure and notes that the linkage should be confirmed after merge",
"max_score": 8
},
{
"name": "PR 302 blocked on CI",
"description": "PR 302 is decided as 'blocked' because ci/test-integration is failing — not as 'merge'",
"max_score": 12
},
{
"name": "PR 302 blocker described",
"description": "PR 302 blocked entry names the failing check (ci/test-integration or integration tests) as the specific blocker",
"max_score": 8
},
{
"name": "PR 303 Dependabot flag",
"description": "PR 303 (Dependabot PR) is NOT decided as immediate merge — the agent notes it requires triage-dependabot first or flags it as a different workflow",
"max_score": 15
},
{
"name": "No --admin flag mentioned",
"description": "merge_decisions.md does NOT suggest using --admin flag to force-merge any of the PRs",
"max_score": 10
},
{
"name": "All 3 PRs covered",
"description": "merge_decisions.md contains a section for each of PR 301, 302, and 303",
"max_score": 15
}
]
}