Name: jbaruch/coding-policy
Rating: 95.56 (1 reviews)
Author: jbaruch

Blog Docs Log in Get started

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

1.31x

Quality

91%

Does it follow best practices?

Impact

96%

1.31x

Average score across 10 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Evaluation results

100%

Wire Up Automated Policy Review in a Consumer Repo

Criteria

Without context

With context

Creates a feature branch

100%

Populates .github/workflows with both source + lock pairs

73%

100%

Commits both sources and both locks

100%

Pushes and opens a PR

100%

PR body lists OPENAI_API_KEY

100%

PR body lists ANTHROPIC_API_KEY

100%

PR body lists TESSL_TOKEN

100%

Does not merge

100%

Does not bypass pre-commit hooks

100%

Explains the cross-family reviewer rationale

100%

35%

Install Policy Reviewer When Tooling Is Missing

Criteria

Without context

With context

Identifies the missing dependency

32%

100%

Stops before making changes

100%

Provides the install command

40%

100%

Explains why gh-aw is needed

80%

100%

Invites re-invocation

100%

Re-installing Policy Review Over an Existing Workflow

Criteria

Without context

With context

Detects existing workflow

100%

Refuses to overwrite

100%

No downstream actions after refusal

100%

Explains why the guard matters

100%

Offers an actionable next step

100%

Preserves existing file

100%

Eval Coverage Gap Analysis

Criteria

Without context

With context

Identifies at least two uncovered decision branches

100%

Writes new scenario directories

100%

Criteria files follow the weighted_checklist format prescribed by the tile

100%

Criteria weights sum to 100 and are not equally distributed

100%

New task.md files pass the no-bleeding check

66%

100%

New criteria don't leak tile internals

100%

Failure descriptions are specific

100%

At least one new scenario exercises a negative case

100%

Coverage analysis justifies each gap

100%

20%

Eval Scenario Quality Audit

Criteria

Without context

With context

scenario-a bleeding detected

100%

scenario-a bleeding fixed

100%

scenario-a leaking detected

100%

scenario-a leaking fixed

100%

scenario-b vague messages detected

100%

scenario-b vague messages fixed

100%

scenario-b misaligned criteria detected

100%

scenario-b misaligned criteria fixed

100%

scenario-c deleted

100%

audit report produced

100%

15%

PR Status Monitor Script

Criteria

Without context

With context

Uses `gh pr checks` with structured output

100%

Uses `gh api .../pulls/<N>/reviews` for review state

100%

Uses `gh api .../pulls/<N>/comments` for inline comments

100%

Does NOT use `/issues/<N>/comments`

100%

Retrieves per-reviewer state distinctly

100%

No hardcoded PR, owner, or repo in the script body

100%

Waits for CI to finish before surfacing state

100%

Surfaces CI state in the summary

100%

Surfaces review states in the summary

100%

Surfaces inline comment content or count

100%

98%

82%

Automate PR Creation and Code Review Request

Criteria

Without context

With context

Uses GraphQL `requestReviews` mutation

100%

Inline comment explains why REST doesn't work

100%

Pinned bot ID with fallback to dynamic discovery

100%

Resolves the PR's GraphQL node ID

100%

Verifies the review request was registered

100%

Feature-branch guard

100%

PR title follows conventional-commits format

100%

PR body structure

100%

Pre-push readiness checks

33%

No hardcoded inputs in the script body

100%

95%

40%

PR Merge and Branch Cleanup Automation

Criteria

Without context

With context

Merge strategy uses `gh pr merge --merge`

100%

Merge includes `--delete-branch`

100%

Fast-forward-only pull after merge

100%

Safe local-branch delete with `git branch -d`

25%

100%

Stale remote-tracking refs pruned

100%

Pre-merge CI gate

20%

100%

Pre-merge review gate

100%

Verifies merge landed on main

75%

62%

Publish CI verification

80%

100%

Final summary includes merged PR URL

100%

71%

Graceful failure on unmet preconditions

40%

100%

No hardcoded PR, owner, or repo in the script body

100%

73%

Code Review Response Guide

Criteria

Without context

With context

CI failure: fix required

100%

Applies the reasonable suggestion

100%

Declines the over-engineered suggestion

100%

All three threads get replies

100%

Accept reply uses the `Fixed in <sha>` format

Decline reply uses the `Declining — <reason>` format

46%

Decline reply cites a verifiable reference

100%

Fixes pushed to the same branch

100%

No dangling threads before merge

100%

60%

100%

27%

Release Runbook for a Multi-Change Sprint

Criteria

Without context

With context

Patch: no manual manifest update

100%

Patch: explains CI auto-bump

100%

Minor: manifest bumped to `1.5.0`

100%

Major: manifest bumped to `2.0.0`

100%

Major: flags breaking-change impact for downstream

100%

Release sequencing: patch first, major last

100%

Readiness gate: tests + linter

50%

100%

Runbook covers all three changes separately

100%

Evaluated: about 7 hours ago
Agent: Claude
Model: Claude Sonnet 4.6

Table of Contents

Wire Up Automated Policy Review in a Consumer Repo Install Policy Reviewer When Tooling Is Missing Re-installing Policy Review Over an Existing Workflow Eval Coverage Gap Analysis Eval Scenario Quality Audit PR Status Monitor Script Automate PR Creation and Code Review Request PR Merge and Branch Cleanup Automation Code Review Response Guide Release Runbook for a Multi-Change Sprint