General-purpose coding policy for Baruch's AI agents
90
91%
Does it follow best practices?
Impact
90%
1.30xAverage score across 18 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent, invoking the adopt-fork-pr skill on a same-repo (non-fork) PR, correctly passes through — recognizing the installed reviewer already covers the PR and reporting its status — instead of running the fork-adoption flow. The skill triggers on generic 'check PR N' phrasing, so the guarded behavior under test is that it does NOT create a parallel branch or duplicate PR for a PR that is already reviewable.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Identifies the PR as originating in the repository itself",
"description": "Determines that PR #12's head branch lives in the repository (not a fork) — e.g. via `gh pr view --json isCrossRepository` or equivalent — and treats that as the deciding fact for how to handle it",
"max_score": 25
},
{
"name": "Recognizes the reviewer already covers it",
"description": "States that the installed policy reviewer already runs on this PR, so no extra wiring or branch work is needed to get it reviewed",
"max_score": 25
},
{
"name": "Creates no branch and pushes nothing",
"description": "Does NOT create a new branch, check out the PR onto a new branch, or push anything to the remote. No adoption mechanics are invoked",
"max_score": 20
},
{
"name": "Opens no duplicate PR",
"description": "Does NOT open a second pull request mirroring #12. The existing PR is the one that proceeds",
"max_score": 15
},
{
"name": "Reports the PR's status",
"description": "Reads and reports the PR's current review verdict and/or check status (e.g. `gh pr view --json reviewDecision,statusCheckRollup` or `gh pr checks`) as the actual response to the request",
"max_score": 15
}
]
}.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
rules
skills
adopt-fork-pr
eval-curation
install-reviewer