CtrlK
BlogDocsLog inGet started
Tessl Logo

nitinjain999/platform-skills

Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.

67

Quality

84%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

README.md

Platform Skills

A production-grade field handbook for platform, DevOps, SRE, and cloud engineers covering Kubernetes, Flux CD, Terraform, GitHub Actions, AWS, OPA/Rego, KEDA, supply chain security, Falco, observability, and more. Use it on GitHub, as a local reference, or with Claude, Codex, Cursor, and Copilot for interactive guidance with blast radius, validation steps, and rollback plans built in.

License Version Domains Commands Examples Editors GitHub Stars Tessl Registry Skill Check

Works With

ToolWhat you get
Claude CodeSlash commands (/platform-skills:review, /platform-skills:debug, and 9 more), interactive guidance, automatic activation on relevant files
CodexSkill invocation with $platform-skills, loaded on demand in any Codex session
CursorProject rules for Chat and Agent — platform review and generation in every file context
GitHub CopilotChat instructions committed to your repo — available to your whole team without individual installs
GitHub (no AI tool)Browse references/ and examples/ directly — a standalone field handbook

If this handbook saves you time, give it a star — it helps others find it. Found a gap or a better pattern? Contributions are welcome — open an issue, improve a reference guide, or add an example.


Why Platform Skills

Platform teams keep rediscovering the same hard lessons: unclear ownership, unsafe IAM, weak Kubernetes defaults, drifting GitOps overlays, CI checks that run too late, and rollback plans that only appear after an incident. Platform Skills turns those lessons into reusable guidance for the tools engineers already use.

Use it when you need a second brain for production platform work:

  • Review a Terraform, Helm, Kubernetes, Flux, GitHub Actions, or AWS change before it merges
  • Generate platform assets with security, observability, validation, and rollback already considered
  • Debug incidents with evidence-first troubleshooting instead of guesswork
  • Give every developer the same platform engineering baseline in Claude, Codex, Cursor, and Copilot

Install In 60 Seconds

Clone once, then install the integration your team uses:

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills
ToolBest forQuick install
Claude CodeInteractive plugin workflows and slash commandsclaude plugin marketplace add https://github.com/nitinjain999/platform-skills && claude plugin install platform-skills
CodexLocal skill invocation with $platform-skills./install.sh --codex
CursorProject rules for Chat and Agent./install.sh --cursor --target ../your-project
GitHub CopilotTeam-wide chat instructions committed to the repo./install.sh --copilot --target ../your-project
EverythingLocal all-agent setup./install.sh --all --target ../your-project

Need manual setup, global editor rules, or troubleshooting? See INSTALLATION.md.

Try It On Your Repo

See BEFORE_AFTER.md for side-by-side before/after examples across Kubernetes, Terraform, Flux CD, GitHub Actions, OPA/Rego, and PR triage. More copy-paste workflows in PROMPTS.md.

Use $platform-skills to review this Terraform change for IAM scope, replacement risk, validation, and rollback.
Review this Kubernetes Deployment for production readiness: securityContext, resources, probes, HPA, PDB, and NetworkPolicy.
My Flux Kustomization is stuck NotReady. Walk me from evidence to fix to rollback.
Generate a production-ready GitHub Actions workflow with OIDC, pinned actions, cache safety, and least privilege.

What is this?

This repository is a reference handbook for developers, DevOps engineers, SREs, cloud engineers, and platform teams. It is structured in independent layers:

  • Handbookreferences/ and examples/ are the main product. Every domain has a deep-dive guide and working example assets you can copy directly into your project. Use it on GitHub, from a local clone, or as a team knowledge base.
  • Claude pluginSKILL.md and .claude-plugin/marketplace.json add an optional routing layer so Claude surfaces the right section of the handbook when you ask platform engineering questions interactively.
  • Codex skill — the repo root is a self-contained skill folder: SKILL.md provides routing, agents/openai.yaml provides Codex UI metadata, and references/ plus examples/ are loaded on demand.
  • Cursor rules.cursorrules and .cursor/rules/*.mdc give Cursor project-level and scoped file rules for platform engineering reviews and generation.
  • Copilot instructions.github/copilot-instructions.md lets teams commit the baseline into application and platform repositories.

All layers work independently. Agent integrations are optional.

Navigate

I want to...Go to
Get started in 5 minutesQUICKSTART.md
Understand how AI agents and skills workHOW_IT_WORKS.md
Full installation guide and troubleshootingINSTALLATION.md
Read a domain guidereferences/
Copy a working exampleexamples/
Copy prompts for Claude, Codex, Cursor, or CopilotPROMPTS.md
Install as a Claude pluginInstallation
Install as a Codex skillInstallation
Add Cursor rulesEditor integrations
Learn how to use each slash commandCOMMANDS.md
Set up VSCode, Copilot, or CursorEDITOR_INTEGRATIONS.md
Contribute a patternCONTRIBUTING.md

Domains

DomainReference guideWhat it covers
Kubernetes Kubernetesreferences/kubernetes.mdCluster baseline, workload patterns, network policy, RBAC, pod security
🛡️ Kyvernoreferences/kyverno.mdValidate/mutate/generate/verifyImages policies, Audit→Deny promotion, PolicyException, PolicyReport, kyverno-cli testing, PSP/Gatekeeper migration
OpenShift OpenShiftreferences/openshift.mdRoutes, SCC compatibility, operator usage, tenant isolation
Argo CD Argo CDreferences/argocd.mdApp-of-apps design, ApplicationSet, sync control, promotion flows
Flux CD Flux CDreferences/fluxcd.mdMonorepo structure, reconciliation, multi-tenancy, image automation
↳ Flux CD Sourcesreferences/fluxcd-sources.mdGitRepository, OCIRepository, HelmRepository, Bucket, ArtifactGenerator
↳ Flux CD ResourceSetsreferences/fluxcd-resourcesets.mdResourceSet templating, input strategies, gitless fleet patterns
↳ Flux CD Notificationsreferences/fluxcd-notifications.mdProvider, Alert, Receiver, Slack/Datadog/GitHub commit status
↳ Flux CD Operatorreferences/fluxcd-operator.mdFluxInstance sizing, multi-tenancy, kustomize patches, FluxReport
↳ Flux CD Kustomizationreferences/fluxcd-kustomization.mdCEL readyExpr, postBuild substitution, SOPS, SSA annotations
↳ Flux CD HelmReleasereferences/fluxcd-helmrelease.mdchartRef vs chart.spec, drift detection, post-renderers, CRD lifecycle
↳ Flux CD Terraformreferences/fluxcd-terraform.mdFlux Operator bootstrap via Terraform
↳ Flux CD MCPreferences/fluxcd-mcp.mdAI-assisted FluxCD debugging via Flux MCP server
↳ Flux CD Migrationreferences/fluxcd-migration.mdv2.7/v2.8 API removals, CLI and Operator upgrade paths
↳ Flux CD Securityreferences/fluxcd-security.mdSecrets, source auth, OCI supply chain, RBAC, image automation security
↳ Flux CD Troubleshootingreferences/fluxcd-troubleshooting.mdIncident cheat-sheet — symptom → cause → fix per controller
AWS AWSreferences/aws.mdIAM least-privilege, IRSA, EKS, resource tagging, cost allocation
AWS AWS CloudFrontreferences/aws-cloudfront.mdDistributions, OAC, cache policies, security headers, Lambda@Edge, CloudFront Functions, multi-account
AWS AWS WAFreferences/aws-waf.mdWeb ACLs, managed rule groups, rate limiting, Bot Control, Firewall Manager, Shield Advanced
Azure Azurereferences/azure.mdWorkload identity, AKS, RBAC, resource tagging, Azure Policy
Terraform Terraformreferences/terraform.mdModule design, state management, testing, CI/CD integration
GitHub Actions GitHub Actionsreferences/github-actions.mdSecurity hardening, OIDC, SHA pinning, reusable workflows
GitHub Actions Composite GitHub Actionsreferences/composite-actions.mdComposite action scaffolding, review, hardening, testing, release, private repo access
🗺️ Platform modelreferences/platform-operating-model.mdOwnership boundaries, promotion flows, cross-tool design
🔐 Secretsreferences/secrets.mdExternal Secrets Operator, Sealed Secrets, provider setup, troubleshooting
Linkerd Linkerdreferences/linkerd.mdmTLS, proxy injection, AuthorizationPolicy, observability, multi-cluster
Linux Linux & Networkingreferences/linux-networking.mdLinux admin, DNS, load balancing, VPC/VNet design, connectivity troubleshooting
🧠 Platform Mindsetreferences/platform-mindset.mdDevEx, friction audits, RFC/ADR, incident comms, post-mortems, capacity planning
🔒 Compliancereferences/compliance.mdSOC 2 Trust Services Criteria in Terraform: IAM, encryption, detection, audit logging, backup, Checkov enforcement
Helm Helmreferences/helm.mdChart scaffolding, values design, template patterns, security hardening, lint/validation pipeline, GitOps integration
🔌 MCPreferences/mcp.mdModel Context Protocol server/client development, TypeScript and Python SDKs, stdio/SSE transports, security, testing
☁️ AWS MCP Profilesreferences/aws-mcp-profiles.mdMulti-account AWS MCP server management — SSO, Granted, credential_process, profile discovery, VS Code and Claude Code config generation
Prometheus Observabilityreferences/observability.mdStructured logging, Prometheus metrics, OpenTelemetry tracing, Grafana dashboards, alerting rules, k6 load testing, capacity planning
📝 Documentationreferences/documentation.mdDocstrings (Google/NumPy/JSDoc), OpenAPI 3.1 specs, doc sites (MkDocs/TypeDoc), developer guides
Datadog Datadogreferences/datadog.mdAgent Helm setup, APM instrumentation, log management, monitors/dashboards/SLOs as Terraform, pup CLI, Datadog Labs skills
🤖 LLM Observabilityreferences/llm-observability.mdDatadog LLMObs instrumentation (Python/Node.js), eval bootstrap, trace RCA, experiment analysis
Dynatrace Dynatracereferences/dynatrace.mdOneAgent Kubernetes Operator, custom metrics, SLOs, dashboards and alerting via Terraform provider
Git Conventional Commitsreferences/conventional-commits.mdMessage structure, type classification, atomic staging, commitlint/husky/semantic-release tooling
📋 OPA / Conftestreferences/opa.mdRego v1 syntax, rule types, unit tests, fmt/regal/verify validation pipeline, GitHub Actions integration
🔍 PR Reviewreferences/pr-review.mdCost impact, environment drift, ownership gaps, SOC 2 compliance, deprecated API / version hygiene, rollback feasibility
🧵 PR Comment Triagecommands/triage.md/platform-skills:triage classifies PR comments, applies valid fixes, replies, and resolves review threads
⚡ KEDAreferences/keda.mdScaledObject, ScaledJob, TriggerAuthentication, Prometheus/SQS/Kafka/Redis/Cron/HTTP/Azure scalers, scale-to-zero, IRSA, GitOps integration, troubleshooting — /platform-skills:keda
🤖 Agent Self-Improvementreferences/agent-self-improve.md.learnings/ directory setup, LRN/ERR/FEAT entry lifecycle, WAL protocol, working buffer, VFM scoring, ADL decision logic, Six Operating Pillars, heartbeat, reverse prompting, proactive agent behavior — /platform-skills:self-improve
🔗 Supply Chain Securityreferences/supply-chain.mdCosign keyless signing, Syft SBOM generation and attestation, Trivy/Grype CVE scanning with severity gates, SLSA Level 2 provenance, Kyverno ImageValidatingPolicy enforcement — /platform-skills:supply-chain
🦅 Runtime Securityreferences/runtime-security.mdFalco eBPF deployment on EKS/GKE, custom rule authoring, Falcosidekick alert routing, rule debugging, bridging Falco signals to Kyverno admission enforcement — /platform-skills:runtime-security
💥 Chaos Engineeringreferences/chaos.mdLitmus Chaos v3 and Chaos Mesh v2 fault injection, steady-state hypothesis (httpProbe/promProbe), blast radius scoping, GameDay workflow, recurring schedules, DORA feedback loop — /platform-skills:chaos
📊 DORA Metricsreferences/dora.mdDeployment Frequency, Lead Time, Change Failure Rate, MTTR — GitHub Actions + Prometheus Pushgateway instrumentation, recording rules, Grafana dashboards, SaaS decision matrix, anti-pattern detection — /platform-skills:dora
✨ Awesome Docsreferences/awesome-docs.mdAnimated GitHub-safe Markdown document generation — any doc type (README, architecture guide, runbook, tutorial, API reference, RFC, post-mortem, or custom), 4 SVG patterns, convert existing docs, diff for staleness, audit quality, local preview, multi-platform export — /platform-skills:awesome-docs
🔄 Renovatereferences/renovate.mdDependency update automation — scan repo and generate renovate.json per ecosystem, private registry auth (ECR/GCR/ACR/Harbor/Helm OCI), custom regex managers for internal GitHub modules and private Terraform registries, pre-commit hook, GitHub Actions validation workflow — /platform-skills:renovate

Core principles

Every pattern in this handbook follows the same ground rules:

  • Production-first — patterns are battle-tested, not theoretical
  • Root-cause over symptom — troubleshooting works backwards from evidence to fix
  • Explicit blast radius — every risky operation documents scope and rollback
  • Security by default — least-privilege IAM, restricted pod security, SHA-pinned actions
  • Rollback plans are mandatory — if you cannot safely undo it, the guide is incomplete

Troubleshooting structure

Every troubleshooting section in the handbook follows this consistent framework — from quick diagnosis to safe resolution:

StepWhat it answers
SymptomExact error and observable behavior
EvidenceCommands to run: logs, events, status
HypothesisMost likely root cause
DiagnosisCommands that confirm or rule out the hypothesis
FixSpecific change with justification
ValidationPost-fix verification steps
PreventionHow to avoid it next time
RollbackSafe undo path if the fix makes things worse

Installation

Browse on GitHub

No installation needed. Navigate directly:

Clone for local templates

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills

# Copy examples directly into your project
cp -r examples/flux/basic-monorepo/*          your-gitops-repo/
cp -r examples/terraform/eks-cluster/*         your-terraform-modules/
cp    examples/kubernetes/deployment-baseline.yaml  your-k8s-manifests/

Install as a Claude plugin

The plugin adds interactive guidance on top of the handbook. Claude will reference the right section automatically when you ask platform engineering questions in your editor, terminal, or browser.

From marketplace:

claude plugin marketplace add https://github.com/nitinjain999/platform-skills
claude plugin install platform-skills

From local clone (for customisation):

git clone https://github.com/nitinjain999/platform-skills.git
cd platform-skills
claude plugin install .

Upgrade to latest version:

claude plugins marketplace update platform-skills
claude plugins remove platform-skills
claude plugins install platform-skills

Install as a Codex skill

Codex discovers skills from the local skills directory. Clone this repository as the skill folder so SKILL.md, agents/openai.yaml, references/, and examples/ stay together:

mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/nitinjain999/platform-skills.git "${CODEX_HOME:-$HOME/.codex}/skills/platform-skills"

Then ask Codex naturally:

Use $platform-skills to review this Terraform change for ownership, blast radius, validation, and rollback.

Install Cursor rules

Copy the Cursor-native rules into your project so every developer gets the same platform guidance in Cursor Chat and Agent:

cp platform-skills/.cursorrules your-project/.cursorrules
mkdir -p your-project/.cursor/rules
cp platform-skills/.cursor/rules/*.mdc your-project/.cursor/rules/

For VSCode, Copilot, Cursor, and JetBrains setup — project level and global level — see EDITOR_INTEGRATIONS.md.

Repository structure

platform-skills/
├── references/                         # Deep-dive guides — one per domain
│   ├── platform-operating-model.md
│   ├── kubernetes.md
│   ├── kyverno.md                      # Kyverno admission policies (v1.11.0)
│   ├── openshift.md
│   ├── argocd.md
│   ├── flux.md
│   ├── aws.md
│   ├── azure.md
│   ├── terraform.md
│   ├── github-actions.md
│   ├── secrets.md
│   ├── linkerd.md
│   ├── linux-networking.md
│   ├── platform-mindset.md
│   ├── compliance.md                   # SOC 2 controls in Terraform (v1.6.0)
│   ├── helm.md                         # Helm chart patterns, lint pipeline, values design
│   ├── pr-review.md                    # PR review: cost, drift, ownership, compliance, upgrade, rollback (v1.12.0)
│   ├── keda.md                         # KEDA event-driven autoscaling (v1.14.0)
│   ├── llm-observability.md                # Datadog LLMObs: instrumentation, evals, trace RCA (v1.20.0)
│   └── awesome-docs.md                     # Animated SVG doc generation — 4 patterns, GitHub-safe CSS (v1.21.0)
│
├── examples/                           # Working examples and handbook snippets
│   ├── flux/basic-monorepo/            # Complete Flux CD monorepo structure
│   ├── kubernetes/                     # Namespace, deployment, network policy, PDB
│   ├── kyverno/                        # ValidatingPolicy, GeneratingPolicy examples + kyverno-cli test manifest (v1.11.0)
│   ├── openshift/                      # Route, ResourceQuota, LimitRange
│   ├── argocd/app-of-apps/             # Root application manifest
│   ├── aws/iam/                        # Least-privilege IAM policy examples
│   ├── azure/workload-identity/        # Managed identity + federated credential
│   ├── terraform/eks-cluster/          # Production EKS Terraform module
│   ├── github-actions/                 # CI/CD, Flux sync, container build workflows
│   ├── helm/web-service/               # Production Helm chart: Deployment, HPA, PDB, NetworkPolicy, schema
│   ├── triage/                         # PR comment triage scenarios and fixtures (v1.13.0)
│   ├── keda/                           # ScaledObject, ScaledJob, TriggerAuthentication examples (v1.14.0)
│   ├── awesome-docs/                       # Animated SVG templates: arch-flow, lifecycle-loop, field-carousel, timeline-phases (v1.21.0)
│   └── compliance/                     # SOC 2 Terraform examples (v1.6.0)
│       ├── checkov-config.yaml         # Checkov config grouped by SOC 2 criterion
│       ├── iam/                        # CC6.1/CC6.2: IAM, IRSA, OIDC, SCPs
│       ├── logging/                    # CC7.2: CloudTrail, Config, VPC flow logs
│       ├── network/                    # CC6.6: WAF, security groups, flow logs
│       ├── encryption-data-services/   # CC6.7: DynamoDB, ECR, ElastiCache, OpenSearch, Kinesis, EFS, Redshift
│       ├── vulnerability/              # CC6.8: Inspector v2, ECR scanning, SSM patching
│       ├── detection/                  # CC7.1: GuardDuty, CIS CloudWatch alarms, Security Hub
│       ├── incident-response/          # CC7.3: SNS, EventBridge, PagerDuty
│       └── backup/                     # A1.2/A1.3: Backup Plan, vault lock, cross-region DR
│
├── SKILL.md                            # Agent skill routing and patterns
├── agents/openai.yaml                  # Codex skill UI metadata
├── .cursorrules                        # Cursor project-level rules
├── .cursor/rules/                      # Cursor scoped file rules
├── .claude-plugin/marketplace.json     # Marketplace metadata
├── .github/workflows/                  # Validation and release automation
├── tests/validate-skill.sh             # Skill structure consistency checks
└── renovate.json                       # Automated dependency updates

Roadmap

Current release: v1.28.0 — 31 commands, 37 domain reference guides, 50+ wiki pages.

Full version history is in CHANGELOG.md.

Planned

  • GCP: landing zone, GKE, Workload Identity, and IAM patterns
  • Istio: traffic management, mTLS, telemetry (counterpart to Linkerd domain)
  • SOC 2 for Kubernetes: Kyverno policies mapped to TSC criteria, pod security admission, kube-bench CIS Benchmark integration
  • OpenShift operator lifecycle: OLM, CatalogSource, operator upgrade patterns
  • Argo CD ApplicationSet fleet patterns: cluster generators, matrix strategies, progressive rollout
  • Multi-cloud networking: Transit Gateway, VNet peering, PrivateLink, cross-cloud DNS

Contributing

See CONTRIBUTING.md for how to propose new patterns, the development workflow, and release guidelines.

Related resources

Sponsor

If Platform Skills saves you time, consider sponsoring to help keep it maintained and growing.

Sponsor

Every sponsor directly supports new domains, pattern updates, and the time spent validating every example in real environments.


Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!


Star History

Star History Chart


License

Apache-2.0. See LICENSE for the full text and NOTICE for attribution.

If you create derivative works based on this project, retain the Apache 2.0 license text, existing copyright and attribution notices, and clearly mark any files you changed.

Support

BEFORE_AFTER.md

CHANGELOG.md

CODE_OF_CONDUCT.md

COMMANDS.md

CONTRIBUTING.md

EDITOR_INTEGRATIONS.md

GETTING_STARTED.md

HOW_IT_WORKS.md

install.sh

INSTALLATION.md

LAUNCH.md

PROMPTS.md

QUICKSTART.md

README.md

renovate.json

SECURITY.md

SKILL.md

tessl.json

tile.json