Rules and skills that teach AI agents how to contribute to open source projects without being the villain.
95
91%
Does it follow best practices?
Impact
96%
3.55xAverage score across 20 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests detection depth on the body / template compliance rubric for an already-existing PULL REQUEST. Target: good-oss-citizen/demo-streamqueue PR #8. The repo has a single PR template at .github/PULL_REQUEST_TEMPLATE.md with sections Summary, Related Issue, Type of Change, AI Assistance, Testing, Checklist. PR #8 fills every required section, including AI Assistance as `Tool: None`, `Used for: N/A`, `Human-verified: Yes — wrote and reviewed every line manually`. However, the same body ends with a `🤖 Generated with [Claude Code]` line and a `Co-Authored-By: Claude <noreply@anthropic.com>` trailer — a body-local contradiction that makes the AI Assistance answer unreliable. The tile-prescribed outcome is `Slight deviation` (template structurally complete but a required answer is unreliable due to a body-local contradiction) plus a focused comment asking the author to reconcile the AI Assistance declaration with the trailer. Baseline agents tend to either skim past the trailer, accept the body as compliant, or produce a long generic template request instead of a focused clarification. 'Drafts a comment' and 'does not post' are bled by the task language and have been removed; what remains tests tile-specific behaviors.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Uses template-compliance flow for existing PR body",
"description": "The agent treats the task as checking an already-existing PR body against the repo's PR template, not as preparing a new PR, reviewing code correctness, or editing template files. Strong evidence is visible in the final answer when it applies template-compliance concepts and references PR #8; do not require hidden tool-call or activation telemetry.",
"max_score": 8
},
{
"name": "Fetches PR template and existing PR body",
"description": "The final answer or visible transcript shows the agent used both the repo's PR template and the existing PR body for #8. Strong final-answer evidence includes naming the PR template's sections/items and accurately citing the body's populated content. If tool-call evidence is visible, ideal commands are `github.sh templates-pr good-oss-citizen/demo-streamqueue` and `github.sh body good-oss-citizen/demo-streamqueue 8`, but do not require hidden tool-call telemetry. This specifically exercises the new `body` command's pull_request classification path.",
"max_score": 12
},
{
"name": "Selects .github/PULL_REQUEST_TEMPLATE.md",
"description": "The agent compares PR #8 against `.github/PULL_REQUEST_TEMPLATE.md`. It does not use issue templates, invent a PR structure, or ignore the template because the PR is already closed.",
"max_score": 6
},
{
"name": "Detects the AI Assistance contradiction",
"description": "The agent identifies that the body's AI Assistance section declares `Tool: None` / `Human-verified: Yes — wrote and reviewed every line manually`, while the same body ends with a Claude Code attribution and a `Co-Authored-By: Claude` trailer. It frames this as a body-local contradiction that makes the AI disclosure answer unreliable. Tile-specific: baseline agents may notice the trailer but accept the body as compliant or treat the trailer as boilerplate; the rubric prescribes flagging the inconsistency as a required-answer-unreliable case.",
"max_score": 22
},
{
"name": "Classifies as Slight deviation, not match or significant",
"description": "Reports `Result: Slight deviation` because the body is structurally complete but a required answer is unreliable due to a body-local contradiction. `Matches well enough` is wrong because the contradiction must be reconciled. `Significant deviation` is also wrong because this is one focused clarification, not a broad template rewrite. Tile-specific: the rubric's bucket vocabulary.",
"max_score": 10
},
{
"name": "Drafts a focused clarification comment, not a generic template request",
"description": "The suggested comment asks the PR author to reconcile the AI Assistance answer with the Claude Code attribution / `Co-Authored-By` trailer (or to remove whichever does not reflect actual usage). It is one focused ask, not a long nit list of unrelated template items, and not `No comment needed`. Tile-specific: rubric's 'Slight deviation' guidance to draft a concise clarification rather than a full-template rewrite.",
"max_score": 12
},
{
"name": "Does not over-list nits or unrelated template items",
"description": "The agent does not produce a long enumeration of every checked/unchecked checkbox or unrelated template detail. It surfaces the contradiction and stays focused. Tile-specific: the rubric tells the agent to ask author to reconcile rather than enumerate; baseline agents reflexively itemize.",
"max_score": 6
},
{
"name": "Does not ask for information already present",
"description": "Does not ask the author to add Summary, Related Issue, Type of Change, AI Assistance, Testing, or Checklist headings generically — those are already populated. The ask is targeted at the inconsistency only.",
"max_score": 4
}
]
}