Auto-generated tile from GitHub (10 skills)
92
94%
Does it follow best practices?
Impact
92%
1.16xAverage score across 44 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly applies the discoverability filter to a bloated AGENTS.md, removing discoverable or tooling-enforced content while retaining genuine landmines and non-standard tooling instructions. Evaluates that the resulting file uses the preferred section structure.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Tech stack removed",
"description": "The output AGENTS.md does NOT contain a paragraph or section summarizing the tech stack (e.g. language, frameworks, or libraries used)",
"max_score": 8
},
{
"name": "Directory overview removed",
"description": "The output AGENTS.md does NOT contain a directory structure listing or description of folder layout",
"max_score": 8
},
{
"name": "Generic advice removed",
"description": "The output AGENTS.md does NOT contain generic software engineering advice (e.g. 'write clear commit messages', 'follow DRY principle', 'keep functions small')",
"max_score": 8
},
{
"name": "Linter-enforced rule removed",
"description": "The output AGENTS.md does NOT contain the instruction about avoiding `var` declarations (already enforced by ESLint)",
"max_score": 8
},
{
"name": "DB-wipe landmine retained",
"description": "The output AGENTS.md retains the warning that running `npm test` without setting NODE_ENV=test will wipe the dev database",
"max_score": 12
},
{
"name": "Non-standard tooling retained",
"description": "The output AGENTS.md retains the instruction to use `pnpm` instead of `npm` for package operations",
"max_score": 12
},
{
"name": "Integration test caveat retained",
"description": "The output AGENTS.md retains the caveat that integration tests require the `--forceExit` flag or they hang",
"max_score": 12
},
{
"name": "Recommended sections used",
"description": "The output AGENTS.md uses at least ONE of the recommended section headings: 'Non-discoverable commands', 'Landmines', 'do-not-touch', or 'Task-specific constraints'",
"max_score": 8
},
{
"name": "No architecture narrative",
"description": "The output AGENTS.md does NOT contain descriptions of how system components interact or architectural decisions inferable from code",
"max_score": 8
},
{
"name": "Result is shorter",
"description": "The output AGENTS.md has fewer lines than the input AGENTS.md (bloat was removed)",
"max_score": 8
},
{
"name": "No boilerplate header",
"description": "The output AGENTS.md does NOT start with a boilerplate intro paragraph explaining what AGENTS.md is or its general purpose",
"max_score": 8
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
skills
documentation
fastify
init
linting-neostandard-eslint9
node
nodejs-core
rules
oauth
octocat
snipgrapher