Auto-generated tile from GitHub (10 skills)
92
94%
Does it follow best practices?
Impact
92%
1.16xAverage score across 44 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly structures AGENTS.md for a large monorepo with distinct packages. The agent should recommend or produce module-level AGENTS.md files rather than a single monolithic root file, apply the discoverability filter to exclude README-derivable content, use the recommended section structure, and identify operationally significant guidance from CI/workflow files and package configs.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Hierarchical structure",
"description": "The agent creates or explicitly recommends separate AGENTS.md files for individual packages (e.g., packages/api/AGENTS.md, packages/web/AGENTS.md) rather than placing everything in a single root AGENTS.md",
"max_score": 14
},
{
"name": "Scope routing section",
"description": "The root AGENTS.md includes a section that explains which packages or directories have their own AGENTS.md files and should be consulted for module-specific guidance (e.g., a 'Scope & routing' section or equivalent)",
"max_score": 12
},
{
"name": "No tech stack in root",
"description": "The root AGENTS.md does NOT list the technology stack (languages, frameworks, libraries) — this information is in README.md and discoverable",
"max_score": 10
},
{
"name": "No directory structure",
"description": "The root AGENTS.md does NOT reproduce the directory/package layout — discoverable by reading the repo tree",
"max_score": 8
},
{
"name": "Recommended section name",
"description": "At least one AGENTS.md file (root or package-level) uses one of these section headings: 'Non-discoverable commands', 'Landmines / do-not-touch areas', 'Task-specific constraints', or 'Landmines'",
"max_score": 9
},
{
"name": "CI quirk captured",
"description": "At least one AGENTS.md file includes the non-obvious test execution requirement from the CI workflow: that the api package integration tests require DATABASE_RESET=1 to be set or they will fail silently with zero failing assertions",
"max_score": 14
},
{
"name": "No generic advice",
"description": "No AGENTS.md file produced by the agent contains generic best-practice advice such as 'write tests', 'use meaningful variable names', or 'follow the style guide'",
"max_score": 9
},
{
"name": "Source files checked",
"description": "The agent's output references or incorporates content from at least two non-README source files (e.g., CI workflow, package.json scripts, Makefile) — evidenced by guidance that could only come from reading those files",
"max_score": 10
},
{
"name": "CLI landmine retained",
"description": "At least one AGENTS.md file (root or packages/cli/AGENTS.md) includes the non-discoverable warning that the CLI's --dry-run flag does NOT prevent writes to the config file at ~/.appname/config.json",
"max_score": 14
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
skills
documentation
fastify
init
linting-neostandard-eslint9
node
nodejs-core
rules
oauth
octocat
snipgrapher