CtrlK
BlogDocsLog inGet started
Tessl Logo

nitinjain999/platform-skills

Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.

67

Quality

84%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

GETTING_STARTED.md

Getting Started

Who this is for

Platform engineers and developers who want to understand how to use the handbook effectively — which tool owns what, how to ask good questions, and where to look for patterns.

If you are still setting up, start with QUICKSTART.md or INSTALLATION.md first, then come back here.


Part 1 — For platform engineers

The ownership model

Do not ask about tools in isolation. Ask from the platform point of view. Every infrastructure decision belongs to exactly one layer:

LayerOwns
TerraformCloud resources, cluster bootstrap, IAM, networking, secrets backends
Kubernetes / OpenShiftWorkload specs, RBAC, network policy, resource limits
Flux / Argo CDIn-cluster state, HelmReleases, workload promotion
GitHub ActionsCI validation, artifact publish, promotion triggers

Critical rules:

  • Choose either Flux or Argo CD for a given ownership boundary — never both against the same boundary without a migration plan
  • Terraform bootstraps; Flux/Argo CD reconciles. Flux does not manage cloud resources
  • GitHub Actions does not store long-lived environment truth in workflow YAML

Where to look in this repo

Start here based on your task:

TaskFile
Ownership boundaries and repo topologyreferences/platform-operating-model.md
Kubernetes baseline patternsreferences/kubernetes.md
Terraform module and state designreferences/terraform.md
FluxCD GitOps patterns (overview)references/fluxcd.md
FluxCD source CRDsreferences/fluxcd-sources.md
FluxCD ResourceSet and fleet templatingreferences/fluxcd-resourcesets.md
FluxCD notifications (Alert, Receiver)references/fluxcd-notifications.md
FluxCD Operator (FluxInstance, FluxReport)references/fluxcd-operator.md
FluxCD Kustomization advancedreferences/fluxcd-kustomization.md
FluxCD HelmRelease advancedreferences/fluxcd-helmrelease.md
FluxCD Terraform bootstrapreferences/fluxcd-terraform.md
FluxCD MCP server (AI debugging)references/fluxcd-mcp.md
FluxCD API migration (v2.7/v2.8)references/fluxcd-migration.md
FluxCD security audit checklistreferences/fluxcd-security.md
Argo CD patternsreferences/argocd.md
AWS platform guidancereferences/aws.md
Azure platform guidancereferences/azure.md
GitHub Actions securityreferences/github-actions.md
Composite GitHub Actionsreferences/composite-actions.md
SOC 2 controls in Terraformreferences/compliance.md
Helm chart patternsreferences/helm.md
Kyverno admission policiesreferences/kyverno.md
OPA / Conftest Rego policiesreferences/opa.md
PR review dimensionsreferences/pr-review.md

Examples under examples/ are meant to be adapted, not copied blindly into production.

Good prompts

Include: what you are trying to do, which platform, which tool owns the change, the actual file or error, and the desired end state.

Review this Terraform layout for a multi-environment EKS platform. I want clear separation between reusable modules and live environment state.
[paste layout]
My Argo CD application is out of sync after a merge. Here is the manifest and sync status. What is the most likely root cause and what evidence should I collect first?
I run OpenShift on AWS. Should ingress, cert-manager, and observability be managed by Terraform or GitOps?
Review this GitHub Actions workflow for OIDC, least privilege, and unsafe trigger choices.
[paste workflow]

Common mistakes to avoid

  • Mixing Terraform and GitOps ownership for the same resource
  • Using both Flux and Argo CD against the same boundary without a migration plan
  • Asking broad questions without sharing the actual file, YAML, or error
  • Treating examples in this repo as complete production systems without adapting them
  • Putting environment truth into GitHub Actions workflow YAML

Simple workflow loop

  1. Identify the platform problem
  2. Identify the owning layer
  3. Open the matching reference file
  4. Ask for a concrete recommendation or review
  5. Apply the smallest useful change first
  6. Validate before expanding the pattern

Part 2 — For new agent users

How Claude and Codex use this skill

When you install platform-skills as a Claude plugin, Codex skill, or Cursor ruleset, the agent loads the right platform guidance based on what you are working on. You do not need to tell the agent which file to read — it activates from context (the file types you paste, the tools you mention, the error text).

You can also invoke any workflow explicitly:

Use $platform-skills with the kyverno workflow to generate a ValidatingPolicy that requires team labels
Use $platform-skills with the pr-review rollback workflow to score the feasibility of this change

How to get concrete answers

Agents work best with concrete input. Always include:

  • the actual file or manifest (not a description of it)
  • the exact error message
  • the cluster, environment, or cloud provider
  • the desired end state

Too vague:

How do I fix my Flux reconciliation?

Concrete — gets a useful answer:

My Flux Kustomization `apps` is stuck in NotReady with: "context deadline exceeded". I merged 20 minutes ago. Here is the output of `flux get kustomizations -A`:
[paste output]

What the skill cannot do

  • It cannot run kubectl, terraform, or git commands on your behalf — it explains what to run
  • It cannot see your cluster or cloud account — paste the relevant output
  • It works best on one concrete problem at a time, not "review everything"

All 31 command workflows

See COMMANDS.md for every command with modes and example prompts:

CommandUse it for
reviewProduction-readiness check on any manifest, Terraform, workflow
debugStructured troubleshooting for any platform symptom
terraformBlast radius, IAM least privilege, SOC 2, state impact
gitopsFlux / Argo CD — debug live issues or audit a GitOps repo
helmcheckScaffold, review, or security-audit a Helm chart
kyvernoGenerate, test, audit, or migrate Kyverno policies
opaGenerate, test, or debug OPA/Conftest Rego policies
complianceSOC 2 gap analysis, control implementation, audit evidence
pr-reviewCost, drift, ownership, compliance, upgrade, rollback
observabilityInstrument, alert, dashboard, load test, capacity
commitConventional commit message generation and validation
linkerdmTLS, proxy injection, policy, multi-cluster diagnostics
linuxDNS, load balancer, VPC, process, disk, networking
datadogAgent setup, APM, monitors, SLOs, incident investigation
dynatraceOperator, instrumentation, SLOs, Davis AI investigation
documentDocstrings, OpenAPI specs, docs sites, guides
mcpScaffold, review, or debug an MCP server
aws-profileDiscover, switch, and validate AWS profiles for MCP servers
productDevEx audit, RFC/ADR, incident update, post-mortem
triageUse /platform-skills:triage to classify, fix, reply to, and resolve PR comments
kedaUse /platform-skills:keda to generate, debug, review, or design a KEDA scaling strategy
self-improveBootstrap .learnings/ workspace, log errors and learnings, resume after interruption, promote to project memory
supply-chainSign images, generate SBOMs, run CVE gates, enforce image signatures, generate SLSA provenance
runtime-securityDeploy Falco with eBPF, write custom rules, route alerts, debug rule firing, bridge to Kyverno
chaosInstall Litmus Chaos or Chaos Mesh, generate fault experiments, schedule chaos, run GameDay, debug, report
doraInstrument DORA metrics, generate Grafana dashboards, benchmark against performance bands, debug metric gaps
awesome-docsGenerate any animated Markdown doc (README, architecture guide, runbook, tutorial, RFC, post-mortem, or custom), convert existing Markdown, update/diff/audit diagrams, export
awsGenerate or review CloudFront, WAF, Lambda@Edge, CloudFront Functions, and Firewall Manager patterns
composite-actionsScaffold, review, secure, debug, publish, and improve composite GitHub Actions
fluxcdFluxCD entry point — routes to gitops debug, gitops audit, helmcheck, or review based on input
renovateGenerate renovate.json covering all dep file types in the repo, or emit a GHA validation workflow

How the agent and skill system work

If you want to understand what is happening under the hood — how Claude loads skills, what activates them, and what the agent can and cannot do — read HOW_IT_WORKS.md.


Next step

  1. QUICKSTART.md — if you have not installed yet
  2. README.md — domain table and repo overview
  3. The one reference file closest to your current task
  4. COMMANDS.md — when you want to explore specific workflows

BEFORE_AFTER.md

CHANGELOG.md

CODE_OF_CONDUCT.md

COMMANDS.md

CONTRIBUTING.md

EDITOR_INTEGRATIONS.md

GETTING_STARTED.md

HOW_IT_WORKS.md

install.sh

INSTALLATION.md

LAUNCH.md

PROMPTS.md

QUICKSTART.md

README.md

renovate.json

SECURITY.md

SKILL.md

tessl.json

tile.json