Audit and build the infrastructure a repo needs so agents can work autonomously — boot scripts, smoke tests, CI/CD gates, dev environment setup, observability, and isolation. Use when a repo can't boot, tests are broken or missing, there's no dev environment, agents can't verify their work, or agents need human help to get anything done. Do not use for reviewing an existing diff or for documentation-only cleanup.
97
100%
Does it follow best practices?
Impact
87%
1.03xAverage score across 3 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent correctly audits a repository across the four agent-readiness dimensions (bootable, testable, observable, verifiable), assigns pass/partial/fail status with evidence and gaps, derives the overall grade as the lowest dimension, and produces an init.sh boot script following the health-check loop pattern.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Four dimensions present",
"description": "audit-report.md includes all four dimensions: bootable, testable, observable, and verifiable",
"max_score": 8
},
{
"name": "Status labels used",
"description": "Each dimension in audit-report.md is assigned one of the three statuses: pass, partial, or fail",
"max_score": 8
},
{
"name": "Evidence cited per dimension",
"description": "Each dimension in audit-report.md cites at least one specific file or command as evidence",
"max_score": 8
},
{
"name": "Gap described per dimension",
"description": "Each dimension that is not 'pass' in audit-report.md includes a gap describing what is missing",
"max_score": 8
},
{
"name": "Overall grade is lowest dimension",
"description": "audit-report.md states an overall grade that corresponds to the lowest-rated individual dimension (not an average)",
"max_score": 8
},
{
"name": "Mocks identified as zero testability",
"description": "audit-report.md identifies the existing mock-based test suite as not counting toward testability (mocked tests score zero)",
"max_score": 8
},
{
"name": "Background app start in init.sh",
"description": "scripts/init.sh starts the application process in the background (using & or equivalent) so the script can poll",
"max_score": 10
},
{
"name": "Health poll loop in init.sh",
"description": "scripts/init.sh contains a loop that polls the health or readiness endpoint repeatedly (at minimum 10 iterations) before declaring success",
"max_score": 10
},
{
"name": "Non-zero exit on failure",
"description": "scripts/init.sh exits with a non-zero status code (exit 1 or equivalent) if the app does not start successfully",
"max_score": 10
},
{
"name": "Strict mode in init.sh",
"description": "scripts/init.sh includes set -euo pipefail or equivalent strict mode at the top",
"max_score": 8
},
{
"name": "Layer ordering respected",
"description": "improvements.md or audit-report.md references addressing Boot before Smoke, and Smoke before higher layers — no higher layers built before boot is resolved",
"max_score": 7
},
{
"name": "No mock tests credited",
"description": "The report does NOT suggest the existing mock-based jest tests provide meaningful test coverage or constitute passing testability",
"max_score": 7
}
]
}