Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
67
84%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
A production-grade field handbook for platform, DevOps, SRE, and cloud engineers covering Kubernetes, Flux CD, Terraform, GitHub Actions, AWS, OPA/Rego, KEDA, supply chain security, Falco, observability, and more. Use it on GitHub, as a local reference, or with Claude, Codex, Cursor, and Copilot for interactive guidance with blast radius, validation steps, and rollback plans built in.
| Tool | What you get |
|---|---|
| Claude Code | Slash commands (/platform-skills:review, /platform-skills:debug, and 9 more), interactive guidance, automatic activation on relevant files |
| Codex | Skill invocation with $platform-skills, loaded on demand in any Codex session |
| Cursor | Project rules for Chat and Agent — platform review and generation in every file context |
| GitHub Copilot | Chat instructions committed to your repo — available to your whole team without individual installs |
| GitHub (no AI tool) | Browse references/ and examples/ directly — a standalone field handbook |
If this handbook saves you time, give it a star — it helps others find it. Found a gap or a better pattern? Contributions are welcome — open an issue, improve a reference guide, or add an example.
Platform teams keep rediscovering the same hard lessons: unclear ownership, unsafe IAM, weak Kubernetes defaults, drifting GitOps overlays, CI checks that run too late, and rollback plans that only appear after an incident. Platform Skills turns those lessons into reusable guidance for the tools engineers already use.
Use it when you need a second brain for production platform work:
Clone once, then install the integration your team uses:
git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills| Tool | Best for | Quick install |
|---|---|---|
| Claude Code | Interactive plugin workflows and slash commands | claude plugin marketplace add https://github.com/nitinjain999/platform-skills && claude plugin install platform-skills |
| Codex | Local skill invocation with $platform-skills | ./install.sh --codex |
| Cursor | Project rules for Chat and Agent | ./install.sh --cursor --target ../your-project |
| GitHub Copilot | Team-wide chat instructions committed to the repo | ./install.sh --copilot --target ../your-project |
| Everything | Local all-agent setup | ./install.sh --all --target ../your-project |
Need manual setup, global editor rules, or troubleshooting? See INSTALLATION.md.
See BEFORE_AFTER.md for side-by-side before/after examples across Kubernetes, Terraform, Flux CD, GitHub Actions, OPA/Rego, and PR triage. More copy-paste workflows in PROMPTS.md.
Use $platform-skills to review this Terraform change for IAM scope, replacement risk, validation, and rollback.Review this Kubernetes Deployment for production readiness: securityContext, resources, probes, HPA, PDB, and NetworkPolicy.My Flux Kustomization is stuck NotReady. Walk me from evidence to fix to rollback.Generate a production-ready GitHub Actions workflow with OIDC, pinned actions, cache safety, and least privilege.This repository is a reference handbook for developers, DevOps engineers, SREs, cloud engineers, and platform teams. It is structured in independent layers:
references/ and examples/ are the main product. Every domain has a deep-dive guide and working example assets you can copy directly into your project. Use it on GitHub, from a local clone, or as a team knowledge base.SKILL.md and .claude-plugin/marketplace.json add an optional routing layer so Claude surfaces the right section of the handbook when you ask platform engineering questions interactively.SKILL.md provides routing, agents/openai.yaml provides Codex UI metadata, and references/ plus examples/ are loaded on demand..cursorrules and .cursor/rules/*.mdc give Cursor project-level and scoped file rules for platform engineering reviews and generation..github/copilot-instructions.md lets teams commit the baseline into application and platform repositories.All layers work independently. Agent integrations are optional.
| I want to... | Go to |
|---|---|
| Get started in 5 minutes | QUICKSTART.md |
| Understand how AI agents and skills work | HOW_IT_WORKS.md |
| Full installation guide and troubleshooting | INSTALLATION.md |
| Read a domain guide | references/ |
| Copy a working example | examples/ |
| Copy prompts for Claude, Codex, Cursor, or Copilot | PROMPTS.md |
| Install as a Claude plugin | Installation |
| Install as a Codex skill | Installation |
| Add Cursor rules | Editor integrations |
| Learn how to use each slash command | COMMANDS.md |
| Set up VSCode, Copilot, or Cursor | EDITOR_INTEGRATIONS.md |
| Contribute a pattern | CONTRIBUTING.md |
| Domain | Reference guide | What it covers |
|---|---|---|
| references/kubernetes.md | Cluster baseline, workload patterns, network policy, RBAC, pod security | |
| 🛡️ Kyverno | references/kyverno.md | Validate/mutate/generate/verifyImages policies, Audit→Deny promotion, PolicyException, PolicyReport, kyverno-cli testing, PSP/Gatekeeper migration |
| references/openshift.md | Routes, SCC compatibility, operator usage, tenant isolation | |
| references/argocd.md | App-of-apps design, ApplicationSet, sync control, promotion flows | |
| references/fluxcd.md | Monorepo structure, reconciliation, multi-tenancy, image automation | |
| ↳ Flux CD Sources | references/fluxcd-sources.md | GitRepository, OCIRepository, HelmRepository, Bucket, ArtifactGenerator |
| ↳ Flux CD ResourceSets | references/fluxcd-resourcesets.md | ResourceSet templating, input strategies, gitless fleet patterns |
| ↳ Flux CD Notifications | references/fluxcd-notifications.md | Provider, Alert, Receiver, Slack/Datadog/GitHub commit status |
| ↳ Flux CD Operator | references/fluxcd-operator.md | FluxInstance sizing, multi-tenancy, kustomize patches, FluxReport |
| ↳ Flux CD Kustomization | references/fluxcd-kustomization.md | CEL readyExpr, postBuild substitution, SOPS, SSA annotations |
| ↳ Flux CD HelmRelease | references/fluxcd-helmrelease.md | chartRef vs chart.spec, drift detection, post-renderers, CRD lifecycle |
| ↳ Flux CD Terraform | references/fluxcd-terraform.md | Flux Operator bootstrap via Terraform |
| ↳ Flux CD MCP | references/fluxcd-mcp.md | AI-assisted FluxCD debugging via Flux MCP server |
| ↳ Flux CD Migration | references/fluxcd-migration.md | v2.7/v2.8 API removals, CLI and Operator upgrade paths |
| ↳ Flux CD Security | references/fluxcd-security.md | Secrets, source auth, OCI supply chain, RBAC, image automation security |
| ↳ Flux CD Troubleshooting | references/fluxcd-troubleshooting.md | Incident cheat-sheet — symptom → cause → fix per controller |
| references/aws.md | IAM least-privilege, IRSA, EKS, resource tagging, cost allocation | |
| references/aws-cloudfront.md | Distributions, OAC, cache policies, security headers, Lambda@Edge, CloudFront Functions, multi-account | |
| references/aws-waf.md | Web ACLs, managed rule groups, rate limiting, Bot Control, Firewall Manager, Shield Advanced | |
| references/azure.md | Workload identity, AKS, RBAC, resource tagging, Azure Policy | |
| references/terraform.md | Module design, state management, testing, CI/CD integration | |
| references/github-actions.md | Security hardening, OIDC, SHA pinning, reusable workflows | |
| references/composite-actions.md | Composite action scaffolding, review, hardening, testing, release, private repo access | |
| 🗺️ Platform model | references/platform-operating-model.md | Ownership boundaries, promotion flows, cross-tool design |
| 🔐 Secrets | references/secrets.md | External Secrets Operator, Sealed Secrets, provider setup, troubleshooting |
| references/linkerd.md | mTLS, proxy injection, AuthorizationPolicy, observability, multi-cluster | |
| references/linux-networking.md | Linux admin, DNS, load balancing, VPC/VNet design, connectivity troubleshooting | |
| 🧠 Platform Mindset | references/platform-mindset.md | DevEx, friction audits, RFC/ADR, incident comms, post-mortems, capacity planning |
| 🔒 Compliance | references/compliance.md | SOC 2 Trust Services Criteria in Terraform: IAM, encryption, detection, audit logging, backup, Checkov enforcement |
| references/helm.md | Chart scaffolding, values design, template patterns, security hardening, lint/validation pipeline, GitOps integration | |
| 🔌 MCP | references/mcp.md | Model Context Protocol server/client development, TypeScript and Python SDKs, stdio/SSE transports, security, testing |
| ☁️ AWS MCP Profiles | references/aws-mcp-profiles.md | Multi-account AWS MCP server management — SSO, Granted, credential_process, profile discovery, VS Code and Claude Code config generation |
| references/observability.md | Structured logging, Prometheus metrics, OpenTelemetry tracing, Grafana dashboards, alerting rules, k6 load testing, capacity planning | |
| 📝 Documentation | references/documentation.md | Docstrings (Google/NumPy/JSDoc), OpenAPI 3.1 specs, doc sites (MkDocs/TypeDoc), developer guides |
| references/datadog.md | Agent Helm setup, APM instrumentation, log management, monitors/dashboards/SLOs as Terraform, pup CLI, Datadog Labs skills | |
| 🤖 LLM Observability | references/llm-observability.md | Datadog LLMObs instrumentation (Python/Node.js), eval bootstrap, trace RCA, experiment analysis |
| references/dynatrace.md | OneAgent Kubernetes Operator, custom metrics, SLOs, dashboards and alerting via Terraform provider | |
| references/conventional-commits.md | Message structure, type classification, atomic staging, commitlint/husky/semantic-release tooling | |
| 📋 OPA / Conftest | references/opa.md | Rego v1 syntax, rule types, unit tests, fmt/regal/verify validation pipeline, GitHub Actions integration |
| 🔍 PR Review | references/pr-review.md | Cost impact, environment drift, ownership gaps, SOC 2 compliance, deprecated API / version hygiene, rollback feasibility |
| 🧵 PR Comment Triage | commands/triage.md | /platform-skills:triage classifies PR comments, applies valid fixes, replies, and resolves review threads |
| ⚡ KEDA | references/keda.md | ScaledObject, ScaledJob, TriggerAuthentication, Prometheus/SQS/Kafka/Redis/Cron/HTTP/Azure scalers, scale-to-zero, IRSA, GitOps integration, troubleshooting — /platform-skills:keda |
| 🤖 Agent Self-Improvement | references/agent-self-improve.md | .learnings/ directory setup, LRN/ERR/FEAT entry lifecycle, WAL protocol, working buffer, VFM scoring, ADL decision logic, Six Operating Pillars, heartbeat, reverse prompting, proactive agent behavior — /platform-skills:self-improve |
| 🔗 Supply Chain Security | references/supply-chain.md | Cosign keyless signing, Syft SBOM generation and attestation, Trivy/Grype CVE scanning with severity gates, SLSA Level 2 provenance, Kyverno ImageValidatingPolicy enforcement — /platform-skills:supply-chain |
| 🦅 Runtime Security | references/runtime-security.md | Falco eBPF deployment on EKS/GKE, custom rule authoring, Falcosidekick alert routing, rule debugging, bridging Falco signals to Kyverno admission enforcement — /platform-skills:runtime-security |
| 💥 Chaos Engineering | references/chaos.md | Litmus Chaos v3 and Chaos Mesh v2 fault injection, steady-state hypothesis (httpProbe/promProbe), blast radius scoping, GameDay workflow, recurring schedules, DORA feedback loop — /platform-skills:chaos |
| 📊 DORA Metrics | references/dora.md | Deployment Frequency, Lead Time, Change Failure Rate, MTTR — GitHub Actions + Prometheus Pushgateway instrumentation, recording rules, Grafana dashboards, SaaS decision matrix, anti-pattern detection — /platform-skills:dora |
| ✨ Awesome Docs | references/awesome-docs.md | Animated GitHub-safe Markdown document generation — any doc type (README, architecture guide, runbook, tutorial, API reference, RFC, post-mortem, or custom), 4 SVG patterns, convert existing docs, diff for staleness, audit quality, local preview, multi-platform export — /platform-skills:awesome-docs |
| 🔄 Renovate | references/renovate.md | Dependency update automation — scan repo and generate renovate.json per ecosystem, private registry auth (ECR/GCR/ACR/Harbor/Helm OCI), custom regex managers for internal GitHub modules and private Terraform registries, pre-commit hook, GitHub Actions validation workflow — /platform-skills:renovate |
Every pattern in this handbook follows the same ground rules:
Every troubleshooting section in the handbook follows this consistent framework — from quick diagnosis to safe resolution:
| Step | What it answers |
|---|---|
| Symptom | Exact error and observable behavior |
| Evidence | Commands to run: logs, events, status |
| Hypothesis | Most likely root cause |
| Diagnosis | Commands that confirm or rule out the hypothesis |
| Fix | Specific change with justification |
| Validation | Post-fix verification steps |
| Prevention | How to avoid it next time |
| Rollback | Safe undo path if the fix makes things worse |
No installation needed. Navigate directly:
git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills
# Copy examples directly into your project
cp -r examples/flux/basic-monorepo/* your-gitops-repo/
cp -r examples/terraform/eks-cluster/* your-terraform-modules/
cp examples/kubernetes/deployment-baseline.yaml your-k8s-manifests/The plugin adds interactive guidance on top of the handbook. Claude will reference the right section automatically when you ask platform engineering questions in your editor, terminal, or browser.
From marketplace:
claude plugin marketplace add https://github.com/nitinjain999/platform-skills
claude plugin install platform-skillsFrom local clone (for customisation):
git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills
claude plugin install .Upgrade to latest version:
claude plugins marketplace update platform-skills
claude plugins remove platform-skills
claude plugins install platform-skillsCodex discovers skills from the local skills directory. Clone this repository as the skill folder so SKILL.md, agents/openai.yaml, references/, and examples/ stay together:
mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/nitinjain999/platform-skills.git "${CODEX_HOME:-$HOME/.codex}/skills/platform-skills"Then ask Codex naturally:
Use $platform-skills to review this Terraform change for ownership, blast radius, validation, and rollback.Copy the Cursor-native rules into your project so every developer gets the same platform guidance in Cursor Chat and Agent:
cp platform-skills/.cursorrules your-project/.cursorrules
mkdir -p your-project/.cursor/rules
cp platform-skills/.cursor/rules/*.mdc your-project/.cursor/rules/For VSCode, Copilot, Cursor, and JetBrains setup — project level and global level — see EDITOR_INTEGRATIONS.md.
platform-skills/
├── references/ # Deep-dive guides — one per domain
│ ├── platform-operating-model.md
│ ├── kubernetes.md
│ ├── kyverno.md # Kyverno admission policies (v1.11.0)
│ ├── openshift.md
│ ├── argocd.md
│ ├── flux.md
│ ├── aws.md
│ ├── azure.md
│ ├── terraform.md
│ ├── github-actions.md
│ ├── secrets.md
│ ├── linkerd.md
│ ├── linux-networking.md
│ ├── platform-mindset.md
│ ├── compliance.md # SOC 2 controls in Terraform (v1.6.0)
│ ├── helm.md # Helm chart patterns, lint pipeline, values design
│ ├── pr-review.md # PR review: cost, drift, ownership, compliance, upgrade, rollback (v1.12.0)
│ ├── keda.md # KEDA event-driven autoscaling (v1.14.0)
│ ├── llm-observability.md # Datadog LLMObs: instrumentation, evals, trace RCA (v1.20.0)
│ └── awesome-docs.md # Animated SVG doc generation — 4 patterns, GitHub-safe CSS (v1.21.0)
│
├── examples/ # Working examples and handbook snippets
│ ├── flux/basic-monorepo/ # Complete Flux CD monorepo structure
│ ├── kubernetes/ # Namespace, deployment, network policy, PDB
│ ├── kyverno/ # ValidatingPolicy, GeneratingPolicy examples + kyverno-cli test manifest (v1.11.0)
│ ├── openshift/ # Route, ResourceQuota, LimitRange
│ ├── argocd/app-of-apps/ # Root application manifest
│ ├── aws/iam/ # Least-privilege IAM policy examples
│ ├── azure/workload-identity/ # Managed identity + federated credential
│ ├── terraform/eks-cluster/ # Production EKS Terraform module
│ ├── github-actions/ # CI/CD, Flux sync, container build workflows
│ ├── helm/web-service/ # Production Helm chart: Deployment, HPA, PDB, NetworkPolicy, schema
│ ├── triage/ # PR comment triage scenarios and fixtures (v1.13.0)
│ ├── keda/ # ScaledObject, ScaledJob, TriggerAuthentication examples (v1.14.0)
│ ├── awesome-docs/ # Animated SVG templates: arch-flow, lifecycle-loop, field-carousel, timeline-phases (v1.21.0)
│ └── compliance/ # SOC 2 Terraform examples (v1.6.0)
│ ├── checkov-config.yaml # Checkov config grouped by SOC 2 criterion
│ ├── iam/ # CC6.1/CC6.2: IAM, IRSA, OIDC, SCPs
│ ├── logging/ # CC7.2: CloudTrail, Config, VPC flow logs
│ ├── network/ # CC6.6: WAF, security groups, flow logs
│ ├── encryption-data-services/ # CC6.7: DynamoDB, ECR, ElastiCache, OpenSearch, Kinesis, EFS, Redshift
│ ├── vulnerability/ # CC6.8: Inspector v2, ECR scanning, SSM patching
│ ├── detection/ # CC7.1: GuardDuty, CIS CloudWatch alarms, Security Hub
│ ├── incident-response/ # CC7.3: SNS, EventBridge, PagerDuty
│ └── backup/ # A1.2/A1.3: Backup Plan, vault lock, cross-region DR
│
├── SKILL.md # Agent skill routing and patterns
├── agents/openai.yaml # Codex skill UI metadata
├── .cursorrules # Cursor project-level rules
├── .cursor/rules/ # Cursor scoped file rules
├── .claude-plugin/marketplace.json # Marketplace metadata
├── .github/workflows/ # Validation and release automation
├── tests/validate-skill.sh # Skill structure consistency checks
└── renovate.json # Automated dependency updatesCurrent release: v1.28.0 — 31 commands, 37 domain reference guides, 50+ wiki pages.
Full version history is in CHANGELOG.md.
Planned
kube-bench CIS Benchmark integrationSee CONTRIBUTING.md for how to propose new patterns, the development workflow, and release guidelines.
If Platform Skills saves you time, consider sponsoring to help keep it maintained and growing.
Every sponsor directly supports new domains, pattern updates, and the time spent validating every example in real environments.
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Apache-2.0. See LICENSE for the full text and NOTICE for attribution.
If you create derivative works based on this project, retain the Apache 2.0 license text, existing copyright and attribution notices, and clearly mark any files you changed.
.claude-plugin
.github
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
supply-chain
terraform
references
scripts
skills
platform-skills
tests