Creates, updates, or prunes an AGENTS.md for any repository by auditing the codebase, detecting non-discoverable gaps, and drafting minimal high-signal instructions that agents cannot infer from reading the code.
90
94%
Does it follow best practices?
Impact
78%
1.06xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly updates an existing AGENTS.md by pruning stale, redundant, and discoverable content rather than blindly preserving it, and produces a required changes summary documenting what was removed and why.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Changes summary file created",
"description": "A changes-summary.md or audit-notes.md file exists documenting what was removed from the existing AGENTS.md",
"max_score": 10
},
{
"name": "Removals documented with reasons",
"description": "The changes summary includes a reason for each removed item citing which repo file makes it discoverable or why it is stale/redundant",
"max_score": 10
},
{
"name": "Tech stack section removed",
"description": "The updated AGENTS.md does NOT contain a tech stack summary section or language/framework descriptions (e.g., no 'Built with Python/FastAPI' or 'Uses PostgreSQL')",
"max_score": 10
},
{
"name": "Stale command removed",
"description": "The updated AGENTS.md does NOT contain references to 'pipenv' or 'pipenv install' — the tool that was replaced by Poetry and is no longer used",
"max_score": 10
},
{
"name": "Generic advice removed",
"description": "The updated AGENTS.md does NOT contain generic advice such as 'follow PEP 8', 'write docstrings', 'add tests for new features', or similar non-specific guidance",
"max_score": 10
},
{
"name": "Non-discoverable content preserved",
"description": "The updated AGENTS.md still contains the non-discoverable migration constraint about the /api/v1/ endpoints being deprecated in favor of /api/v2/",
"max_score": 10
},
{
"name": "New finding added",
"description": "The updated AGENTS.md includes the non-obvious '--no-ansi' flag required when running alembic in Docker environments (present in CI but not in local scripts)",
"max_score": 10
},
{
"name": "Correct heading structure maintained",
"description": "The updated AGENTS.md uses the prescribed headings verbatim and includes a one-line scope statement after the main title heading",
"max_score": 10
},
{
"name": "Redundant directory overview removed",
"description": "The updated AGENTS.md does NOT contain descriptions of what the app/, tests/, or migrations/ directories contain (discoverable from directory listing)",
"max_score": 10
},
{
"name": "Tooling-enforced rules not duplicated",
"description": "The updated AGENTS.md does NOT restate rules already enforced by ruff, mypy, or pre-commit hooks",
"max_score": 10
}
]
}