CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

95

1.31x
Quality

91%

Does it follow best practices?

Impact

96%

1.31x

Average score across 10 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-8/

{
  "context": "Tests whether the agent applies this tile's prescribed merge-and-cleanup sequence — not whether it can reason its way to some working equivalent. The tile teaches a specific combination of flags and commands for consistency across a team (merge-commit strategy, fast-forward-only pull, safe branch delete, remote pruning). Baseline agents typically pick different defaults (squash, plain pull, force delete) — each specific-prescription criterion measures whether the tile was applied.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Merge strategy uses `gh pr merge --merge`",
      "description": "The script calls `gh pr merge ... --merge` (the tile prescribes merge-commit strategy for this workflow). `--squash` or `--rebase` score zero — those are different strategies the tile does not prescribe here",
      "max_score": 10
    },
    {
      "name": "Merge includes `--delete-branch`",
      "description": "The `gh pr merge` invocation passes `--delete-branch` so the remote feature branch is deleted on merge. Separate `git push origin --delete` AFTER merge is acceptable but less clean than the tile's prescribed one-flag approach",
      "max_score": 5
    },
    {
      "name": "Fast-forward-only pull after merge",
      "description": "Uses `git pull --ff-only` (not plain `git pull`, which can create a spurious merge commit on local main). The `--ff-only` prescription is tile-specific — baseline agents default to plain `git pull`",
      "max_score": 10
    },
    {
      "name": "Safe local-branch delete with `git branch -d`",
      "description": "Deletes the local feature branch with `git branch -d` (safe, refuses to delete un-merged work). `git branch -D` (force delete) scores zero — the tile explicitly prefers the safe form",
      "max_score": 8
    },
    {
      "name": "Stale remote-tracking refs pruned",
      "description": "Runs `git remote prune origin` (or `git fetch --prune`) so `origin/*` refs pointing at deleted remote branches are cleaned up. Skipping prune scores zero — the tile prescribes this as part of cleanup",
      "max_score": 7
    },
    {
      "name": "Pre-merge CI gate",
      "description": "The script refuses to merge when CI is not green — exits non-zero with a diagnostic; does NOT proceed to `gh pr merge` on pending or failing CI",
      "max_score": 15
    },
    {
      "name": "Pre-merge review gate",
      "description": "The script refuses to merge when a blocking review is outstanding or any review thread is unresolved. At minimum, checks that no review has state `CHANGES_REQUESTED`",
      "max_score": 10
    },
    {
      "name": "Verifies merge landed on main",
      "description": "Explicit post-merge check that the PR's commits are present on main (`gh pr view`, `git log origin/main`, or equivalent). A silent assumption that `gh pr merge` succeeded scores zero",
      "max_score": 8
    },
    {
      "name": "Publish CI verification",
      "description": "The script verifies that the post-merge publish/release workflow was triggered. Acceptable: `gh run list` on the publish workflow, `gh run view` on the triggered run, or equivalent",
      "max_score": 10
    },
    {
      "name": "Final summary includes merged PR URL",
      "description": "The script prints a final summary naming the merged PR URL and the publish-workflow status, so the developer doesn't have to re-check manually",
      "max_score": 7
    },
    {
      "name": "Graceful failure on unmet preconditions",
      "description": "When CI is red or reviews are outstanding, the script exits non-zero with a stderr diagnostic naming the specific unmet condition — does NOT silently skip or mask the failure",
      "max_score": 5
    },
    {
      "name": "No hardcoded PR, owner, or repo in the script body",
      "description": "OBSERVABLE: the script body contains no literal for the target PR number, owner, or repo — all three come from runtime inputs (args or env vars). Running against a new target requires changing only the invocation, not the script source. Pinning any of them in the script source scores zero",
      "max_score": 5
    }
  ]
}

evals

README.md

tile.json