General-purpose coding policy for Baruch's AI agents
95
91%
Does it follow best practices?
Impact
96%
1.31xAverage score across 10 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent applies this tile's prescribed approach to requesting Copilot as a PR reviewer. The tile prescribes: GraphQL `requestReviews` mutation (REST drops bot reviewers silently); a pinned-bot-ID-with-fallback pattern (hardcoded `BOT_kgDOCnlnWA` for hot path, dynamic discovery via recent reviews when the pin goes stale); post-request verification that the bot appears in `requested_reviewers`. Baseline agents without the tile would reach for REST, not know the Copilot bot ID, or implement a dynamic-only lookup that drops the fast-path pin. Checking for these specific tile choices measures tile value.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Uses GraphQL `requestReviews` mutation",
"description": "The script calls `gh api graphql` with a `requestReviews` mutation. REST paths (`gh pr edit --add-reviewer`, `POST /repos/.../pulls/<N>/requested_reviewers`) score zero — those are the approaches that silently drop bots, which is the specific failure the tile's GraphQL choice exists to prevent",
"max_score": 18
},
{
"name": "Inline comment explains why REST doesn't work",
"description": "The script's inline comments cite the concrete reason GraphQL is required: REST drops bot reviewers silently. The tile requires inline comments for non-obvious steps; a separate README alone is insufficient",
"max_score": 8
},
{
"name": "Pinned bot ID with fallback to dynamic discovery",
"description": "The script uses a pinned Copilot bot ID (`BOT_kgDOCnlnWA`) AND a discovery fallback — query recent reviews for the bot's GraphQL node ID when the pinned ID is rejected by the API. Implementations that hardcode with NO fallback score half. Implementations that do dynamic-only discovery with no pin score half — the prescribed pattern uses both, for latency on the hot path and resilience when GitHub rotates the ID",
"max_score": 20
},
{
"name": "Resolves the PR's GraphQL node ID",
"description": "Before calling the mutation, the script fetches the PR's GraphQL node ID (via `pullRequest(number: N) { id }` or equivalent) rather than passing the integer PR number into a field that expects a node ID",
"max_score": 10
},
{
"name": "Verifies the review request was registered",
"description": "After the mutation, the script inspects the PR state to confirm the bot appears in `requested_reviewers` (or equivalent). Fire-and-forget without verification scores zero — the tile prescribes this post-check because GitHub sometimes accepts the mutation without actually registering the reviewer",
"max_score": 14
},
{
"name": "Feature-branch guard",
"description": "Before pushing, the script confirms the current branch is NOT main/master and fails loudly if it is",
"max_score": 10
},
{
"name": "PR title follows conventional-commits format",
"description": "The PR title the script constructs uses `<type>(<scope>): <imperative summary>` (conventional commits — widely adopted, not tile-invented, but the tile prescribes using it here)",
"max_score": 8
},
{
"name": "PR body structure",
"description": "The PR body template includes a Summary bullet list of what changed and why, and a Test plan as a markdown checklist — the tile's prescribed structure for PR bodies",
"max_score": 7
},
{
"name": "Pre-push readiness checks",
"description": "The script runs the project's tests AND linter before pushing, and fails if either fails. Acceptable regardless of how these are invoked; the tile prescribes this as a cross-cutting readiness gate",
"max_score": 3
},
{
"name": "No hardcoded inputs in the script body",
"description": "OBSERVABLE: the script body contains no literal for the target repository owner, repo name, branch name, PR title, PR body source, or PR number. All of these come from runtime inputs (args or env vars) per the task's input contract. Pinning any of them in the script source scores zero",
"max_score": 2
}
]
}