Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
64
80%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Read in order. Stop as soon as runtime + framework + deploy target are clear.
head -30 README.md 2>/dev/null
ls -1
# Language/runtime detection — read first match, stop when runtime is clear
cat package.json 2>/dev/null \
|| cat pyproject.toml 2>/dev/null \
|| cat go.mod 2>/dev/null \
|| cat Cargo.toml 2>/dev/null \
|| cat pom.xml 2>/dev/null \
|| cat build.gradle 2>/dev/null \
|| cat build.gradle.kts 2>/dev/null \
|| cat settings.gradle.kts 2>/dev/null \
|| cat Gemfile 2>/dev/null \
|| ls *.csproj 2>/dev/null \
|| cat composer.json 2>/dev/null
{ f=$(ls -t .github/workflows/*.yml 2>/dev/null | head -1); [ -n "$f" ] && head -40 "$f"; }
grep -s 'ENTRYPOINT\|CMD' Dockerfile 2>/dev/null | tail -3
test -f .vscode/extensions.json && cat .vscode/extensions.json
test -f CLAUDE.md && echo "CLAUDE.md present"
test -d .cursor && echo ".cursor/ present"
test -f agents/openai.yaml && echo "agents/openai.yaml present"
ls .github/agents/*.agent.md 2>/dev/null && echo "Copilot agents already set up"
grep -l "## Agent Context" CLAUDE.md 2>/dev/null && echo "Claude Code agent sections in CLAUDE.md"
head -1 AGENTS.md 2>/dev/nullhead -1 AGENTS.md 2>/dev/null<!-- generated by platform-skills --> → managed; overwrite freelyHere's what I found — correct anything wrong:
Runtime: <detected or "not detected">
Framework: <detected or "not detected">
Deploy: <inferred from workflow or Dockerfile>
Tests: <framework> (<CI: yes/no>)
IaC: <detected or "none">
AI tools: <detected — e.g. Claude Code, Cursor>
Anything wrong?If no AI tools detected: generate configs for all tools — developer deletes what they don't need.
Ask until you can write agents a developer would trust. Stop when you have enough. Some repos need 2 questions. Some need 10.
Lead with this question first:
Walk me through the last change you shipped. What did that involve?
This reveals: deploy process, review gates, test requirements, environments, approvals, failure modes. One answer replaces a 5-question form. The README tells you what someone wanted the repo to be — this tells you what it actually is.
Follow-up starting points — use as needed, not as a checklist:
If monorepo detected (multiple top-level services or packages):
I see this is a monorepo with multiple services.
Does your team own all of it, or are there areas owned by other teams?
If other teams own areas — which paths, and should agents treat those as off-limits?Encode team ownership in AGENTS.md off-limits with ownership context, not just paths.
If after scan + interview the skill can't populate ≥3 real facts for any agent's ## How to work here section:
I don't have enough signal to write agents with real knowledge of this repo.
Options:
1. Answer a few more questions so I have enough to write something real
2. Generate minimal stubs you fill in manually
3. Skip agents — generate AGENTS.md only as shared context any tool can read
Which would you prefer?Don't generate shallow agents — they create false confidence.
Use the table in references/setup-agents.md. State roster + reasoning, confirm before generating:
I'll generate: coordinator, app (FastAPI detected), test-writer (pytest, mentioned as painful)
I'm offering: navigator (helps new team members and code review)
I won't generate: infra (no IaC), platform (CI is support tooling, not the product)
Does this look right? Add navigator?After the roster is confirmed, show the model menu and suggest a model per agent role. The developer should see costs before deciding — not after.
Fetch current pricing before showing this menu — never display stale numbers.
Fetch from the provider's official pricing page at generate time:
https://docs.anthropic.com/en/docs/about-claude/models (may redirect to docs.claude.com — follow the redirect)https://openai.com/api/pricing/If the fetch succeeds, mark figures as "fetched now — verify before billing decisions". Scraped price tables can lag behind actual billing by hours or days when a provider updates their page. Never present a scraped figure as authoritative.
If fetch fails, show model names and tiers only — omit the price column and note that the developer should check current pricing themselves.
This step is skippable. If the developer wants to defer model selection (--no-model-menu or equivalent signal), use the default suggestions from the role table below and move on. Model selection is the one hard network dependency in the hot path — don't block the whole flow on it.
Model menu template (populate from fetch, or omit price column on failure):
Available models (prices per 1M tokens, input → output — fetched now):
Anthropic (fetch from https://docs.anthropic.com/en/docs/about-claude/models)
<opus-latest> $<in> → $<out> Most capable — complex reasoning, agentic coding
<sonnet-latest> $<in> → $<out> Best speed/intelligence balance ✦ recommended default
<haiku-latest> $<in> → $<out> Fastest, lowest cost
OpenAI (fetch from https://openai.com/api/pricing/)
<model> $<in> → $<out> <description, context window>
Which model should each agent use?
coordinator → <sonnet-latest> (routing + planning, needs broad judgment)
app → <sonnet-latest> (code tasks, balanced capability/cost)
infra → <opus-latest> (⚠ high blast radius — stronger reasoning pays off here)
test-writer → <haiku-latest> (repetitive generation; speed and cost matter more)
navigator → <haiku-latest> (read-only Q&A; no depth needed)
Accept suggestions? (y / change one / change all)Populate <opus-latest>, <sonnet-latest>, <haiku-latest> with the current model IDs and prices from the fetch. If Anthropic adds a new tier since this was written, include it — the fetch is authoritative, not this template.
Suggestion rules by role:
| Role | Suggested tier | Reason |
|---|---|---|
infra, deploy | Opus (most capable) | High blast radius — a wrong infra decision costs far more than the token difference |
coordinator | Sonnet (balanced) | Multi-agent routing + planning; needs broad judgment at reasonable cost |
app, platform, data-pipeline, ml | Sonnet (balanced) | Code quality and context-aware decisions |
test-writer, reviewer | Haiku (fast) | Repetitive or read-only; fast and cheap is the right trade-off |
navigator | Haiku (fast) | Read-only Q&A; no autonomous action |
On "change one": ask which agent, then ask for the model name. Accept free text — the developer may use a model not in this list (Bedrock cross-region inference, Azure, Vertex, etc.).
Write model: only where the tool supports it in frontmatter:
.agent.md: write model: in YAML frontmatter ✓agents/openai.yaml: write model: under the agent key ✓.mdc: no model: field — omit it entirelyCLAUDE.md: no frontmatter — omit it entirely<!-- setup-agents metadata --> block so upgrade mode can re-surface themMarker syntax is format-dependent — see references/setup-agents-schemas.md → "Managed-file marker" for the per-format rule. Never apply <!-- ... --> to YAML or JSON targets.
Overwrite guard — apply before writing any pre-existing agent file:
managed() { head -1 "$1" 2>/dev/null | grep -q 'generated by platform-skills'; }For each target file that already exists:
managed "$f" → overwrite freely (managed by this tool)! managed "$f" → skip and warn: "⚠️ $f exists and is hand-authored — skipping. Inspect manually."This applies to: .github/agents/*.agent.md, .cursor/rules/*.mdc, agents/openai.yaml. It does not apply to CLAUDE.md (always append-only) or .vscode/settings.json / .claude/settings.json (always merge).
See references/setup-agents-prompts.md for 3-section format and navigator pattern.
See references/setup-agents-schemas.md for frontmatter per tool and marker syntax.
See references/setup-agents-template.md for AGENTS.md full structure.
Generate sub-agents first. Coordinator last (knows every agent's handoffs).
Use asset templates for structural output — do not regenerate them verbatim:
| Output | Asset | Key tokens |
|---|---|---|
Copilot VS Code .agent.md frontmatter | assets/frontmatter/copilot-vscode.tmpl | DESCRIPTION, MODEL, TOOLS |
Copilot cloud .agent.md frontmatter | assets/frontmatter/copilot-cloud.tmpl | DESCRIPTION, MODEL, TOOLS |
Cursor .mdc frontmatter | assets/frontmatter/cursor.tmpl | DESCRIPTION, GLOBS, ALWAYS_APPLY |
agents/openai.yaml | assets/frontmatter/codex.tmpl | DISPLAY_NAME, SHORT_DESCRIPTION, DEFAULT_PROMPT, AGENT_ENTRIES |
CLAUDE.md Agent Context section | assets/frontmatter/claude-section.tmpl | AGENT_TABLE_ROWS |
| Navigator agent body | assets/agents/navigator.template.md | REPO_NAME, ENTRY_POINTS, etc. |
AGENTS.md | assets/AGENTS.md.template | all interview answers |
.vscode/mcp.json | assets/mcp/vscode-settings.json.tmpl | copy as-is |
.claude/settings.json MCP block | assets/mcp/claude-settings.json.tmpl | merge into existing |
Render with assets/render.sh — substitute __TOKEN__ placeholders. The model writes only the ## How to work here body for each agent — all structural skeleton comes from the template.
Active staleness guard: Before writing any agent file, verify every path referenced in ## How to work here exists in the current file tree. Fix inline — don't write a dead reference.
At the bottom of AGENTS.md, after all sections:
<!-- setup-agents metadata
generated: YYYY-MM-DD
q1: |
<verbatim answer to last-change-shipped question>
off-limits: |
<verbatim answer to never-want-agent-to-do question>
pain-points:
- "<from interview>"
models:
coordinator: <model-id chosen by developer>
app: <model-id chosen by developer>
infra: <model-id chosen by developer>
-->Use YAML literal block scalars (|) — free text is safe. Quoted YAML strings break on colons, #, backticks, and embedded quotes. Upgrade mode reads this block to warm the session without a cold start.
Note to implementer: When writing this block, do NOT use inline quoted strings for q1/off-limits answers. Always use
|literal blocks. The developer's answer will contain colons, punctuation, and special characters that corrupt inline YAML.
Do NOT regenerate verify-agents.sh. Copy the canonical static asset:
cp <skill-install-path>/assets/verify-agents.sh scripts/verify-agents.sh
chmod +x scripts/verify-agents.shThe script reads .platform-skills/manifest to know which checks to run. Write the manifest now, one token per line matching the confirmed output roster:
mkdir -p .platform-skills
cat > .platform-skills/manifest <<'EOF'
# generated by platform-skills setup-agents
<token-per-confirmed-target>
EOFValid tokens: copilot-vscode, copilot-cloud, copilot-app, cursor, codex, windsurf, vscode-mcp. See references/setup-agents.md → ".platform-skills/manifest format" for the full reference.
Tell the developer: "Run bash scripts/verify-agents.sh any time. Add it to CI — it exits non-zero on missing references so staleness is caught at PR time."
git rev-parse --git-dir &>/dev/null || { echo "Not a git repo — run generate instead."; exit 1; }
git log --oneline -1 &>/dev/null || { echo "No commits yet — run generate instead."; exit 1; }sed -n '/<!-- setup-agents metadata/,/-->/p' AGENTS.md 2>/dev/nullUse sed with a delimited range — grep -A20 silently truncates the block once add-mode additions: entries and multi-line q1/off-limits literal blocks push it past 20 lines.
Use interview answers to warm the session:
claude-opus-4 for infra and claude-haiku-4 for test-writer — keep those?" Allow changing any.# Read which tool targets exist — consistent with how verify-agents.sh discovers them.
# This tells upgrade which paths to scan without hardcoding assumptions.
MANIFEST_TARGETS=$(grep -vE '^#|^[[:space:]]*$' .platform-skills/manifest 2>/dev/null)
if [ -z "$MANIFEST_TARGETS" ]; then
echo "ℹ️ .platform-skills/manifest not found — inferring from directory presence"
# Fallback for repos set up before manifest support
ls .github/agents/*.agent.md 2>/dev/null && MANIFEST_TARGETS="$MANIFEST_TARGETS copilot-vscode"
ls .cursor/rules/*.mdc 2>/dev/null && MANIFEST_TARGETS="$MANIFEST_TARGETS cursor"
test -f agents/openai.yaml && MANIFEST_TARGETS="$MANIFEST_TARGETS codex"
fi
# Find coordinator — check all five output targets, not just Copilot/Cursor.
COORD=$(ls .github/agents/coordinator.agent.md \
.cursor/rules/coordinator.mdc 2>/dev/null | head -1)
# Codex-only repos: openai.yaml holds all agents — read the whole file
[ -z "$COORD" ] && [ -f agents/openai.yaml ] && COORD=agents/openai.yaml
# Claude Code only: Agent Context table in CLAUDE.md
[ -z "$COORD" ] && grep -q "## Agent Context" CLAUDE.md 2>/dev/null && COORD=CLAUDE.md# git log is primary — more reliable than stat (survives timestamp resets on clone)
AGENT_DATE=$(git log -1 --format=%ai -- "$COORD" 2>/dev/null | cut -d' ' -f1)
# python3 mtime is cross-platform fallback (stat -f is macOS-only, stat -c is Linux-only)
if [ -z "$AGENT_DATE" ] && [ -n "$COORD" ]; then
AGENT_DATE=$(python3 -c "import os,datetime; \
print(datetime.datetime.fromtimestamp(os.path.getmtime('$COORD')).strftime('%Y-%m-%d'))" 2>/dev/null)
fi
AGENT_DATE="${AGENT_DATE:-1970-01-01}"CHANGED=$(git log --since="$AGENT_DATE" --name-only --pretty=format: | sort -u | grep -v '^$')
COUNT=$(echo "$CHANGED" | wc -l)
if [ "$COUNT" -gt 50 ]; then
echo "⚠️ $COUNT files changed since agents were last updated."
echo "Showing files most likely to affect agents:"
echo "$CHANGED" | grep -E '^(src|tests|lib|app|.github|terraform|helm|Dockerfile|package\.json|pyproject\.toml|go\.mod)'
else
echo "$CHANGED"
fiPresent actionable punch list, then ask:
Files changed since agents were last updated:
src/routers/payments.py → app agent Knowledge may be stale
terraform/modules/rds/ → infra agent doesn't cover this new module
.github/workflows/deploy.yml → deploy process may have changed
Show proposed updates? (y/n/apply-all)# Cursor agent files exist but .cursor/ directory is gone?
if ls .cursor/rules/*.mdc 2>/dev/null | grep -q .; then
test -d .cursor || echo "⚠️ .cursor/rules/*.mdc files exist but .cursor/ is gone — team moved away from Cursor?"
fi
# Copilot agent files exist but no VS Code or Copilot config?
if ls .github/agents/*.agent.md 2>/dev/null | grep -q .; then
test -f .vscode/extensions.json || echo "ℹ️ Copilot agent files exist but no .vscode/extensions.json detected"
fiPresent each change individually, not all at once:
Update 1/3: app agent — add new router src/routers/payments.py to ## How to work here
Apply? (y/n/apply-all/skip-all)y → apply, move to next itemn → skip this item, explain why it was flagged, move onapply-all → apply all remaining items without further promptsskip-all → stop applying, summarise what was skippedDo not batch all changes into one write. Some items may be intentionally stale (developer wants to keep old context). Per-item approval is non-negotiable.
.claude-plugin
.github
assets
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
karpenter
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
setup-agents
terraform
references
scripts
skills
platform-skills
tests