General-purpose coding policy for Baruch's AI agents
95
91%
Does it follow best practices?
Impact
96%
1.31xAverage score across 10 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent applies this tile's prescribed merge-and-cleanup sequence — not whether it can reason its way to some working equivalent. The tile teaches a specific combination of flags and commands for consistency across a team (merge-commit strategy, fast-forward-only pull, safe branch delete, remote pruning). Baseline agents typically pick different defaults (squash, plain pull, force delete) — each specific-prescription criterion measures whether the tile was applied.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Merge strategy uses `gh pr merge --merge`",
"description": "The script calls `gh pr merge ... --merge` (the tile prescribes merge-commit strategy for this workflow). `--squash` or `--rebase` score zero — those are different strategies the tile does not prescribe here",
"max_score": 10
},
{
"name": "Merge includes `--delete-branch`",
"description": "The `gh pr merge` invocation passes `--delete-branch` so the remote feature branch is deleted on merge. Separate `git push origin --delete` AFTER merge is acceptable but less clean than the tile's prescribed one-flag approach",
"max_score": 5
},
{
"name": "Fast-forward-only pull after merge",
"description": "Uses `git pull --ff-only` (not plain `git pull`, which can create a spurious merge commit on local main). The `--ff-only` prescription is tile-specific — baseline agents default to plain `git pull`",
"max_score": 10
},
{
"name": "Safe local-branch delete with `git branch -d`",
"description": "Deletes the local feature branch with `git branch -d` (safe, refuses to delete un-merged work). `git branch -D` (force delete) scores zero — the tile explicitly prefers the safe form",
"max_score": 8
},
{
"name": "Stale remote-tracking refs pruned",
"description": "Runs `git remote prune origin` (or `git fetch --prune`) so `origin/*` refs pointing at deleted remote branches are cleaned up. Skipping prune scores zero — the tile prescribes this as part of cleanup",
"max_score": 7
},
{
"name": "Pre-merge CI gate",
"description": "The script refuses to merge when CI is not green — exits non-zero with a diagnostic; does NOT proceed to `gh pr merge` on pending or failing CI",
"max_score": 15
},
{
"name": "Pre-merge review gate",
"description": "The script refuses to merge when a blocking review is outstanding or any review thread is unresolved. At minimum, checks that no review has state `CHANGES_REQUESTED`",
"max_score": 10
},
{
"name": "Verifies merge landed on main",
"description": "Explicit post-merge check that the PR's commits are present on main (`gh pr view`, `git log origin/main`, or equivalent). A silent assumption that `gh pr merge` succeeded scores zero",
"max_score": 8
},
{
"name": "Publish CI verification",
"description": "The script verifies that the post-merge publish/release workflow was triggered. Acceptable: `gh run list` on the publish workflow, `gh run view` on the triggered run, or equivalent",
"max_score": 10
},
{
"name": "Final summary includes merged PR URL",
"description": "The script prints a final summary naming the merged PR URL and the publish-workflow status, so the developer doesn't have to re-check manually",
"max_score": 7
},
{
"name": "Graceful failure on unmet preconditions",
"description": "When CI is red or reviews are outstanding, the script exits non-zero with a stderr diagnostic naming the specific unmet condition — does NOT silently skip or mask the failure",
"max_score": 5
},
{
"name": "No hardcoded PR, owner, or repo in the script body",
"description": "OBSERVABLE: the script body contains no literal for the target PR number, owner, or repo — all three come from runtime inputs (args or env vars). Running against a new target requires changing only the invocation, not the script source. Pinning any of them in the script source scores zero",
"max_score": 5
}
]
}