General-purpose coding policy for Baruch's AI agents
88
91%
Does it follow best practices?
Impact
88%
1.72xAverage score across 18 eval scenarios
Advisory
Suggest reviewing before use
scenario-a bleeding detected
0%
100%
scenario-a bleeding fixed
0%
100%
scenario-a leaking detected
100%
100%
scenario-a leaking fixed
100%
100%
scenario-b vague messages detected
100%
100%
scenario-b vague messages fixed
100%
100%
scenario-b misaligned criteria detected
100%
100%
scenario-b misaligned criteria fixed
100%
100%
scenario-c deleted
100%
100%
audit report produced
100%
100%
Uses GraphQL `requestReviews` mutation
0%
100%
Inline comment explains why REST doesn't work
12%
100%
Pinned bot ID with fallback to dynamic discovery
40%
100%
Resolves the PR's GraphQL node ID
0%
100%
Verifies the review request was registered
71%
100%
Feature-branch guard
100%
100%
PR title follows conventional-commits format
62%
100%
PR body structure
71%
100%
Pre-push readiness checks
0%
0%
No hardcoded inputs in the script body
100%
100%
CI failure: fix required
100%
100%
Applies the reasonable suggestion
100%
100%
Declines the over-engineered suggestion
100%
100%
All three threads get replies
100%
100%
Accept reply uses the `Fixed in <sha>` format
0%
0%
Decline reply uses the `Declining — <reason>` format
46%
0%
Decline reply cites a verifiable reference
100%
100%
Fixes pushed to the same branch
100%
100%
No dangling threads before merge
100%
100%
Patch: no manual manifest update
0%
0%
Patch: explains CI auto-bump
0%
0%
Minor: manifest bumped to `1.5.0`
100%
100%
Major: manifest bumped to `2.0.0`
100%
100%
Major: flags breaking-change impact for downstream
100%
100%
Release sequencing: patch first, major last
100%
100%
Readiness gate: tests + linter
40%
50%
Runbook covers all three changes separately
100%
100%
names canonical cause
50%
100%
prescribes rewrite-criteria
100%
100%
rejects fix-task and retire
100%
85%
replacement criteria are tile-specific
100%
100%
names canonical cause
66%
100%
prescribes fix-task
100%
100%
preserves the criterion
100%
100%
task rewrite strips technique, keeps situation
100%
100%
names canonical cause
14%
100%
prescribes retire
0%
100%
reasoning cites baseline equivalence
90%
100%
no spurious fix-task or rewrite-criteria
0%
100%
Uses `gh pr checks` with structured output
0%
100%
Uses `gh api .../pulls/<N>/reviews` for review state
100%
100%
Uses `gh api .../pulls/<N>/comments` for inline comments
100%
100%
Does NOT use `/issues/<N>/comments`
100%
100%
Retrieves per-reviewer state distinctly
100%
100%
No hardcoded PR, owner, or repo in the script body
100%
100%
Waits for CI to finish before surfacing state
100%
100%
Surfaces CI state in the summary
100%
100%
Surfaces review states in the summary
100%
100%
Surfaces inline comment content or count
100%
100%
Surfaces merge-readiness state for conflict diagnosis
0%
100%
Diagnoses why the fork PR is not reviewed
100%
100%
Brings the branch into the base repo
100%
100%
Preserves the contributor's commits unchanged
100%
100%
Opens a same-repo PR from the adopted branch
100%
100%
Leaves the original fork PR open
100%
100%
Links the adopted PR back to the original
100%
100%
Does not fabricate an Author-Model declaration
100%
100%
Identifies the PR as originating in the repository itself
12%
48%
Recognizes the reviewer already covers it
48%
72%
Creates no branch and pushes nothing
100%
100%
Opens no duplicate PR
100%
100%
Reports the PR's status
100%
100%
Creates a feature branch
100%
100%
Plan populates .github/workflows with both source + lock pairs
73%
100%
Commits both sources and both locks
100%
100%
Pushes and opens a PR
100%
100%
PR body lists OPENAI_API_KEY
100%
100%
PR body lists ANTHROPIC_API_KEY
100%
100%
PR body lists TESSL_TOKEN
100%
100%
Does not merge
100%
100%
Does not bypass pre-commit hooks
100%
100%
Explains the cross-family reviewer rationale
100%
100%
Rule file frontmatter declares alwaysApply: false
0%
100%
Rule file frontmatter declares applyTo with glob patterns
100%
100%
applyTo value combines globs with a natural-language clause
0%
100%
plugin.json rules array includes the new rule path
100%
100%
Rule body has H1 title matching the filename concept
100%
100%
Existing rules and manifest entries are preserved unchanged
100%
100%
Rule file frontmatter flipped to alwaysApply: false
0%
100%
Rule file frontmatter gains applyTo with glob patterns
100%
100%
applyTo value combines globs with a natural-language clause
0%
100%
plugin.json carries no per-rule config and its rules array is intact
100%
100%
Rule body content is preserved unchanged
100%
100%
Existing rule (commit-conventions) is preserved unchanged
100%
100%
Rule file frontmatter declares alwaysApply: true
100%
100%
Rule file frontmatter declares no scoping fields
100%
100%
plugin.json rules array includes the new rule path
100%
100%
Rule body covers the stdlib-first practice
100%
100%
Rule body covers the dependency-pinning practice
100%
100%
Existing rules and manifest entries are preserved unchanged
100%
100%
Detects existing workflow
100%
100%
Refuses to overwrite
100%
80%
No downstream actions after refusal
100%
26%
Explains why the guard matters
100%
100%
Offers an actionable next step
100%
100%
Preserves existing file
100%
80%
identifies suite as clean
0%
0%
does not fabricate diagnoses
0%
0%
recognizes negative-case acceptability
0%
0%
output is appropriately minimal
30%
0%
Merge strategy uses `gh pr merge --merge`
0%
87%
Merge includes `--delete-branch`
100%
100%
Fast-forward-only pull after merge
100%
100%
Safe local-branch delete with `git branch -d`
0%
100%
Stale remote-tracking refs pruned
100%
100%
Pre-merge CI gate
40%
100%
Pre-merge review gate
50%
100%
Verifies merge landed on main
100%
100%
Pre-merge registry baseline captured
0%
100%
SHA-bound publish-run resolution
0%
100%
Watch publish run to terminal state
50%
100%
Conjunction check: run-success AND registry-advance AND moderation-clear
0%
100%
Final summary includes merged PR URL
100%
100%
Graceful failure on unmet preconditions
66%
100%
No hardcoded PR, owner, or repo in the script body
100%
100%
Runs the migration mechanics
28%
100%
Renames the tessl tile CLI alias
100%
100%
Keeps the v1/tiles API route
0%
100%
Keeps code identifiers
0%
100%
Keeps the legacy tile.json reference
57%
100%
Distinguishes prose from contract rather than blanket-replacing
0%
100%
Table of Contents