Audit and build the infrastructure a repo needs so agents can work autonomously — boot scripts, smoke tests, CI/CD gates, dev environment setup, observability, and isolation. Use when a repo can't boot, tests are broken or missing, there's no dev environment, agents can't verify their work, or agents need human help to get anything done. Do not use for reviewing an existing diff or for documentation-only cleanup.
97
100%
Does it follow best practices?
Impact
87%
1.03xAverage score across 3 eval scenarios
Passed
No known issues
Concrete patterns for building each readiness layer. Substitute your project's actual tools.
Every project needs a single command to start. The tool doesn't matter — consistency does.
Boot the app and confirm it's alive. Run at the start of every agent session.
#!/usr/bin/env bash
# scripts/init.sh
set -euo pipefail
<your-boot-command> &
APP_PID=$!
for i in $(seq 1 30); do
curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 && break
sleep 1
done
curl -sf http://localhost:${PORT:-3000}/health > /dev/null 2>&1 || {
echo "ERROR: App failed to start"; kill $APP_PID 2>/dev/null; exit 1
}
echo "App is ready"For services with dependencies (DB, Redis, queues), use Docker Compose with health checks:
services:
app:
build: .
ports: ["${PORT:-3000}:3000"]
depends_on:
db: { condition: service_healthy }
db:
image: postgres:16
healthcheck:
test: pg_isready
interval: 2s
timeout: 5s
retries: 10Boot: docker compose up -d --wait
Fast (< 5 seconds) check that the app is alive. Not user flows — just "did it start."
# HTTP service
curl -sf http://localhost:3000/health | jq .
# CLI tool
./dist/my-cli --version
# UI app (Playwright)
npx playwright test smoke.spec.tsKey user flows on the real running app.
npx playwright test e2e/diff output.json expected.json)Prefer these over large suites of unit tests that mock the seam under change. For agent verification, one honest integration or e2e check is usually worth more than many self-verifying mocked tests.
# .git-hooks/pre-push
#!/usr/bin/env bash
set -euo pipefail
<your-lint-command>
<your-smoke-command>Wire: git config core.hooksPath .git-hooks
Smoke + integration on every PR:
# .github/workflows/verify.yml
on: [pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: <your-boot-command>
- run: <your-smoke-command>
- run: <your-test-command>Add dead-code and unused-symbol checks when the stack supports them. These tools are cheap, deterministic, and good at catching stale scaffolding that agents leave behind.
# TypeScript / JavaScript
npx knip
# Go
staticcheck ./...For Go repos using golangci-lint, enable the unused and dead-code style analyzers there instead of inventing a separate wrapper.
Error messages should tell the agent how to fix the issue:
meta: {
messages: {
noDirectFetch: 'Use the API client from lib/api instead of fetch(). See docs/api-conventions.md'
}
}Prefer mechanical checks for error-handling hygiene when the stack supports them:
Structured JSON logs + machine-readable health endpoints. This is what makes "Grade B" possible — agents can query results, not just read code.
# Structured log line
{"level":"info","ts":"...","msg":"request","method":"GET","path":"/api/items","status":200,"duration_ms":12}
# Health endpoint
GET /health → {"status":"ok","version":"1.2.3","uptime":3600}Datadog's insight: observability isn't just for production. Wire it into the dev environment so agents can verify behavior through telemetry, not just test assertions.
Reproducible test state prevents non-deterministic failures:
# scripts/seed.sh
<your-db-reset-command>
<your-seed-command>Keep fixtures in fixtures/ or test/fixtures/ — version with the repo.
For parallel agents on the same repo:
git worktree add ../feature-xyz -b feature-xyz origin/main
export PORT=$((3000 + $(echo "$PWD" | cksum | cut -d' ' -f1) % 1000))
export COMPOSE_PROJECT_NAME="app-$(basename $PWD)"
docker compose up -d --waitRules: no hardcoded ports, each worktree gets its own Docker Compose project, tear down after completion.
Always deterministic (hardcoded, no LLM): linting, formatting, branch creation, push, PR template, test runner invocation, Docker startup.
Agentic (LLM decides): understanding the task, implementation, fixing failures, deciding which files to change.
This split saves tokens, reduces errors, and guarantees critical steps happen every time.
Run targeted checks when the agent finishes a task — before commit, not just in CI. Silent on success, error-only on failure to avoid context flooding.
# .git-hooks/pre-commit or agent stop hook
set -euo pipefail
<your-typecheck-command> >/dev/null 2>&1 || <your-typecheck-command> 2>&1 | tail -20
<your-targeted-test-command> >/dev/null 2>&1 || <your-targeted-test-command> 2>&1 | tail -20Pattern: run silently, only show output on failure. Run only tests related to changed files, not the full suite. Most test runners support file-pattern filtering.
Max 2 CI rounds. No infinite loops.
1. Agent implements change
2. Local lint + smoke (deterministic, < 5 seconds)
3. Push to CI — autofix known patterns on failure
4. One more attempt if unfixed
5. After 2nd CI failure → hand back to humanA PR that's 80% correct and an engineer polishes in 20 minutes > an agent retrying indefinitely at escalating token cost.