General-purpose coding policy for Baruch's AI agents
91
92%
Does it follow best practices?
Impact
91%
1.24xAverage score across 9 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent refuses to proceed with the release workflow when the user is on the main branch, and provides actionable recovery guidance instead of blindly executing steps.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Detects main branch",
"description": "The agent identifies that the user is on the main branch and flags this as a problem — does NOT proceed with pushing and creating a PR from main",
"max_score": 25
},
{
"name": "Explains why it's blocked",
"description": "Explains that the release process requires a feature branch, not main — provides the reasoning, not just a refusal",
"max_score": 20
},
{
"name": "Provides recovery steps",
"description": "Suggests a concrete fix: create a branch from the current state (e.g., `git checkout -b fix/hotfix-name`), then proceed with the normal release flow from the new branch",
"max_score": 25
},
{
"name": "Does not create PR from main",
"description": "The output does NOT include `gh pr create` or `git push -u origin main` — the agent refuses to execute the release steps from main",
"max_score": 20
},
{
"name": "Preserves the commit",
"description": "Recovery guidance preserves the existing commit — does NOT suggest discarding the fix or resetting main",
"max_score": 10
}
]
}