General-purpose coding policy for Baruch's AI agents
90
91%
Does it follow best practices?
Impact
90%
1.76xAverage score across 18 eval scenarios
Advisory
Suggest reviewing before use
scenario-a bleeding detected
0%
100%
scenario-a bleeding fixed
0%
100%
scenario-a leaking detected
100%
100%
scenario-a leaking fixed
100%
100%
scenario-b vague messages detected
100%
100%
scenario-b vague messages fixed
100%
100%
scenario-b misaligned criteria detected
100%
0%
scenario-b misaligned criteria fixed
100%
0%
scenario-c deleted
100%
100%
audit report produced
100%
100%
Uses GraphQL `requestReviews` mutation
0%
100%
Inline comment explains why REST doesn't work
37%
100%
Pinned bot ID with fallback to dynamic discovery
30%
100%
Resolves the PR's GraphQL node ID
0%
100%
Verifies the review request was registered
100%
100%
Feature-branch guard
100%
100%
PR title follows conventional-commits format
100%
100%
PR body structure
57%
100%
Pre-push readiness checks
0%
66%
No hardcoded inputs in the script body
100%
100%
CI failure: fix required
100%
100%
Applies the reasonable suggestion
100%
100%
Declines the over-engineered suggestion
100%
100%
All three threads get replies
100%
100%
Accept reply uses the `Fixed in <sha>` format
0%
0%
Decline reply uses the `Declining — <reason>` format
0%
0%
Decline reply cites a verifiable reference
100%
62%
Fixes pushed to the same branch
100%
100%
No dangling threads before merge
100%
100%
Patch: no manual manifest update
66%
0%
Patch: explains CI auto-bump
0%
0%
Minor: manifest bumped to `1.5.0`
100%
100%
Major: manifest bumped to `2.0.0`
100%
100%
Major: flags breaking-change impact for downstream
100%
100%
Release sequencing: patch first, major last
100%
100%
Readiness gate: tests + linter
100%
70%
Runbook covers all three changes separately
100%
100%
names canonical cause
66%
100%
prescribes rewrite-criteria
100%
100%
rejects fix-task and retire
100%
80%
replacement criteria are tile-specific
100%
75%
names canonical cause
0%
100%
prescribes fix-task
100%
100%
preserves the criterion
100%
100%
task rewrite strips technique, keeps situation
15%
100%
names canonical cause
42%
100%
prescribes retire
100%
100%
reasoning cites baseline equivalence
100%
100%
no spurious fix-task or rewrite-criteria
100%
100%
Uses `gh pr checks` with structured output
0%
100%
Uses `gh api .../pulls/<N>/reviews` for review state
100%
0%
Uses `gh api .../pulls/<N>/comments` for inline comments
100%
100%
Does NOT use `/issues/<N>/comments`
0%
0%
Retrieves per-reviewer state distinctly
100%
60%
No hardcoded PR, owner, or repo in the script body
100%
100%
Waits for CI to finish before surfacing state
100%
100%
Surfaces CI state in the summary
50%
100%
Surfaces review states in the summary
100%
50%
Surfaces inline comment content or count
100%
100%
Surfaces merge-readiness state for conflict diagnosis
0%
0%
Diagnoses why the fork PR is not reviewed
40%
100%
Brings the branch into the base repo
0%
100%
Preserves the contributor's commits unchanged
0%
100%
Opens a same-repo PR from the adopted branch
0%
100%
Leaves the original fork PR open
50%
100%
Links the adopted PR back to the original
0%
100%
Does not fabricate an Author-Model declaration
100%
100%
Identifies the PR as originating in the repository itself
8%
20%
Recognizes the reviewer already covers it
40%
80%
Creates no branch and pushes nothing
100%
100%
Opens no duplicate PR
100%
100%
Reports the PR's status
100%
100%
Creates a feature branch
75%
100%
Plan populates .github/workflows with both source + lock pairs
0%
100%
Commits both sources and both locks
0%
100%
Pushes and opens a PR
100%
100%
PR body lists OPENAI_API_KEY
0%
100%
PR body lists ANTHROPIC_API_KEY
0%
100%
PR body lists TESSL_TOKEN
0%
100%
Does not merge
100%
100%
Does not bypass pre-commit hooks
100%
100%
Explains the cross-family reviewer rationale
0%
100%
Rule file frontmatter declares alwaysApply: false
0%
100%
Rule file frontmatter declares applyTo with glob patterns
36%
100%
applyTo value combines globs with a natural-language clause
0%
100%
plugin.json rules array includes the new rule path
100%
100%
Rule body has H1 title matching the filename concept
100%
100%
Existing rules and manifest entries are preserved unchanged
100%
100%
Rule file frontmatter flipped to alwaysApply: false
100%
100%
Rule file frontmatter gains applyTo with glob patterns
54%
100%
applyTo value combines globs with a natural-language clause
0%
100%
plugin.json carries no per-rule config and its rules array is intact
100%
100%
Rule body content is preserved unchanged
100%
100%
Existing rule (commit-conventions) is preserved unchanged
100%
100%
Rule file frontmatter declares alwaysApply: true
0%
100%
Rule file frontmatter declares no scoping fields
0%
100%
plugin.json rules array includes the new rule path
100%
100%
Rule body covers the stdlib-first practice
100%
100%
Rule body covers the dependency-pinning practice
100%
100%
Existing rules and manifest entries are preserved unchanged
0%
0%
Detects existing workflow
40%
100%
Refuses to overwrite
0%
96%
No downstream actions after refusal
0%
86%
Explains why the guard matters
0%
86%
Offers an actionable next step
0%
100%
Preserves existing file
0%
100%
identifies suite as clean
0%
100%
does not fabricate diagnoses
0%
100%
recognizes negative-case acceptability
0%
100%
output is appropriately minimal
40%
100%
Merge strategy uses `gh pr merge --merge`
75%
100%
Merge includes `--delete-branch`
0%
100%
Fast-forward-only pull after merge
100%
100%
Safe local-branch delete with `git branch -d`
0%
100%
Stale remote-tracking refs pruned
100%
100%
Pre-merge CI gate
60%
100%
Pre-merge review gate
50%
16%
Verifies merge landed on main
60%
100%
Pre-merge registry baseline captured
0%
100%
SHA-bound publish-run resolution
30%
100%
Watch publish run to terminal state
12%
100%
Conjunction check: run-success AND registry-advance AND moderation-clear
0%
100%
Final summary includes merged PR URL
50%
100%
Graceful failure on unmet preconditions
100%
100%
No hardcoded PR, owner, or repo in the script body
100%
100%
Runs the migration mechanics
35%
100%
Renames the tessl tile CLI alias
100%
100%
Keeps the v1/tiles API route
0%
100%
Keeps code identifiers
0%
100%
Keeps the legacy tile.json reference
100%
100%
Distinguishes prose from contract rather than blanket-replacing
25%
100%
Table of Contents