CtrlK
BlogDocsLog inGet started
Tessl Logo

nitinjain999/platform-skills

Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.

67

Quality

84%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

SKILL.md

name:
platform-skills
description:
Use when troubleshooting, implementing, reviewing, or auditing platform infrastructure as a system — where Kubernetes, GitOps, CI/CD, and security concerns intersect. Provides structured diagnosis with blast radius, validation steps, and rollback plan for: Kubernetes, Flux CD, Argo CD, Terraform, GitHub Actions (composite actions, OIDC, SHA pinning), AWS, Azure, GKE, Linkerd, KEDA, supply chain security (Cosign, SBOM, SLSA), Falco, Chaos Engineering, DORA metrics, Datadog/Dynatrace/LLM observability, SOC 2, and PR review.

Platform Skills

Use this skill for hands-on help with Kubernetes, GitOps, cloud infrastructure, CI/CD, secrets management, service mesh, Linux administration, networking, and platform product thinking — whether you are a solo developer or part of a large platform team.

Pick the right tool for the job

LayerWhen to use
TerraformCloud primitives, cluster bootstrap, IAM, networking, secrets backends
KubernetesWorkload, RBAC, network policy, platform baseline across distributions
OpenShiftKubernetes patterns adapted to OpenShift routing, SCC, and OLM
Flux / Argo CDIn-cluster reconciliation, Helm releases, workload promotion
GitHub ActionsValidate, package, gate, and promote. Keep workflows declarative.
AWS / Azure / GKEProvider-specific account, identity, and governance patterns
LinkerdAutomatic mTLS, golden-signal observability, traffic management
Linux & NetworkingDNS, load balancer routing, VPC/VNet, kernel tuning, connectivity
ComplianceSOC 2 controls in Terraform — IAM, encryption, audit logging, Checkov
Helm (Helmcheck)Chart scaffolding, lint/validate pipeline, values design, security hardening
MCPBuild/debug MCP servers — tools, resources, transports, auth
AWS MCP ProfilesDiscover/switch AWS profiles across VS Code + Claude Code MCP configs — multi-account, SSO, Granted, credential_process
ObservabilityPrometheus, OpenTelemetry, Grafana, alerting, k6 load tests, capacity
DocumentationDocstrings (Google/NumPy/JSDoc), OpenAPI 3.1, MkDocs, guides
DatadogAgent on Kubernetes, APM, monitors, dashboards, SLOs, LLMObs
DynatraceOneAgent Operator, auto-instrumentation, anomaly detection, SLOs
Conventional CommitsGenerate WHY-driven commit messages, atomic staging, validate
OPA / ConftestRego policies, unit tests, fmt/regal/verify pipeline, debug
KyvernoCEL-based ValidatingPolicy, MutatingPolicy, ImageValidatingPolicy
PR ReviewCost, drift, ownership, SOC 2, deprecated APIs, rollback feasibility
PR TriageClassify comments ACTIONABLE_FIX/INFORMATIONAL/NOT_APPLICABLE, fix, reply
KEDAScaledObject/ScaledJob, all scalers, TriggerAuthentication, scale-to-zero
Agent Self-Improvement.learnings/ workspace, LRN/ERR lifecycle, WAL, VFM, ADL
Supply Chain SecurityCosign signing, Syft SBOM, Trivy/Grype CVE gates, SLSA Level 2
Runtime SecurityFalco eBPF, custom rules, Falcosidekick routing, Kyverno enforcement
Awesome DocsAnimated SVG Markdown — README, runbook, RFC, architecture, post-mortem
Composite ActionsFull action repo scaffold, SHA pinning, secrets-as-inputs, actionlint
GitOps debug5-workflow structured debug → 5-section report with root cause
GitOps audit6-phase repo audit → prioritized Critical/Warning/Info report
Platform MindsetDevEx, friction audits, RFC/ADR, incident communication, post-mortems
RenovateDependency update automation — generate renovate.json from repo scan, emit GHA validation workflow

If a task spans multiple areas, decide which layer owns the source of truth and keep the other layers consumers of that state.

Apply These Platform Rules

  • Separate reusable platform building blocks from live environment configuration.
  • Prefer GitOps pull-based reconciliation for cluster state and CI push-based automation for validation and packaging.
  • Choose either Flux or Argo CD for a given ownership boundary unless the task is explicitly about migration between them.
  • Keep Terraform responsible for bootstrapping clusters, cloud resources, secrets backends, and access primitives. Do not let Flux or Argo CD recreate those foundations unless there is a deliberate controller-based design.
  • Use Flux or Argo CD for in-cluster add-ons, workloads, Helm releases, and app-level environment promotion after bootstrap.
  • Use GitHub Actions for checks, plans, policy gates, artifact publishing, and promotion orchestration. Do not store long-lived environment truth in workflow YAML.
  • Prefer OIDC or workload identity over static cloud credentials.
  • Model environments explicitly. Promotion should be visible in Git history and reversible by commit rollback.
  • For Linux and networking changes, validate at each layer before escalating: confirm the process is listening (ss -tulnp), then L3 reachability (ping), L4 connectivity (nc -zv), L7 response (curl -v), and security group / NACL rules last. Do not skip layers.
  • For every Terraform change, enforce in order: terraform fmt -check -recursive, terraform validate, conftest test (OPA/Rego policy gates — runs after validate, before plan as a blocking gate), tflint --recursive, security scan (tfsec or checkov), then plan. Do not let format, lint, or policy failures reach the plan step.
  • For every Helm chart change, enforce in order: helm lint --strict, helm template --debug, kubeconform -strict -summary on rendered output, checkov on rendered manifests, then helm test in-cluster. Fail CI on any helm lint --strict warning.
  • Enforce a tag baseline on all cloud resources. The specific keys are an organizational decision. Use AWS default_tags (provider level) or Azure merge(local.common_tags, {...}) (module local) so the baseline is applied once, not repeated per resource. Back it with AWS Tag Policies or Azure Policy so resources created outside Terraform are also covered.

Structure the Response

For design or implementation work, provide output in this order:

  1. Target architecture and ownership boundaries
  2. Repository or directory layout
  3. Identity, secrets, and promotion model
  4. Validation and deployment workflow
  5. Risks, tradeoffs, and migration path

When asked to generate code, start from the thinnest useful slice that proves the pattern and note which layer remains intentionally out of scope.

Pick the Right Reference Files

Load only the files needed for the current request.

FileScope
references/platform-operating-model.mdRepo topology, ownership boundaries, promotion flow
references/terraform.mdModule patterns, environments, state, testing
references/kubernetes.mdCluster baseline, workload, RBAC, policy
references/openshift.mdOpenShift routing, SCC, OLM, tenancy
references/fluxcd.mdBootstrap, reconciliation, FluxInstance, ResourceSet, image automation
references/fluxcd-sources.mdGitRepository, OCIRepository, HelmRepository, Bucket, ArtifactGenerator
references/fluxcd-resourcesets.mdResourceSet templating, input strategies, gitless fleet patterns
references/fluxcd-notifications.mdProvider, Alert, Receiver, Slack/Datadog/GitHub commit status
references/fluxcd-operator.mdFluxInstance sizing, multi-tenancy, kustomize patches, FluxReport
references/fluxcd-kustomization.mdCEL readyExpr, postBuild substitution, SOPS, SSA annotations
references/fluxcd-helmrelease.mdchartRef vs chart.spec, drift detection, post-renderers, CRD lifecycle
references/fluxcd-terraform.mdFlux Operator bootstrap via Terraform
references/fluxcd-mcp.mdAI-assisted FluxCD debugging via Flux MCP server
references/fluxcd-migration.mdv2.7/v2.8 API removals, CLI and Operator upgrade paths
references/fluxcd-security.mdSecrets, source auth, OCI supply chain, RBAC, image automation security
references/fluxcd-troubleshooting.mdIncident cheat-sheet — symptom → cause → fix per controller
references/argocd.mdApp delivery, ApplicationSet, sync policies
references/aws.mdLanding zones, IAM, EKS patterns
references/aws-mcp-profiles.mdAWS MCP profile management — multi-account SSO, Granted, credential_process, context budget, starter kits
references/azure.mdManagement groups, identity, AKS patterns
references/aws-cloudfront.mdCloudFront distributions, OAC, Lambda@Edge, security headers
references/aws-waf.mdWeb ACLs, managed rules, rate limiting, Firewall Manager
references/github-actions.mdReusable workflows, OIDC, delivery controls
references/composite-actions.mdComposite action scaffold, SHA pinning, secrets-as-inputs, actionlint
references/secrets.mdExternal Secrets Operator, Sealed Secrets, secrets strategy
references/linkerd.mdmTLS, observability, traffic management, multi-cluster
references/linux-networking.mdDNS, load balancing, VPC/VNet, kernel tuning, connectivity
references/platform-mindset.mdDevEx, friction audits, RFC/ADR, incident communication, post-mortems
references/compliance.mdSOC 2 controls, IAM, encryption, audit logging, Checkov evidence
references/helm.mdChart scaffolding, lint pipeline, values design, GitOps integration
references/mcp.mdMCP protocol, SDKs, transports, schema validation, auth, testing
references/observability.mdPrometheus, OpenTelemetry, Grafana, alerting, k6, capacity
references/documentation.mdDocstrings, OpenAPI 3.1, MkDocs, developer guides
references/datadog.mdAgent, APM, monitors, dashboards, SLOs, LLMObs, FluxCD monitoring
references/llm-observability.mdLLMObs instrumentation, eval bootstrap, trace RCA
references/dynatrace.mdOneAgent, auto-instrumentation, anomaly detection, SLOs, Terraform
references/conventional-commits.mdCommit message structure, atomic staging, commitlint, semantic-release
references/opa.mdRego v1 syntax, rule types, unit tests, fmt/regal/verify pipeline
references/kyverno.mdValidatingPolicy, MutatingPolicy, ImageValidatingPolicy, CEL, kyverno-cli
references/pr-review.mdCost, drift, ownership, compliance, deprecated APIs, rollback scoring
references/keda.mdScaledObject, ScaledJob, scalers, TriggerAuthentication, scale-to-zero
references/agent-self-improve.md.learnings/ workspace, WAL, VFM, ADL, status/migrate
references/supply-chain.mdCosign, Syft SBOM, Trivy/Grype, SLSA Level 2, ImageValidatingPolicy
references/runtime-security.mdFalco eBPF, custom rules, Falcosidekick, Kyverno enforcement
references/chaos.mdLitmus Chaos, Chaos Mesh, steady-state hypothesis, GameDay
references/dora.mdDeployment Frequency, Lead Time, CFR, MTTR, Prometheus instrumentation
references/awesome-docs.mdAnimated SVG Markdown — architecture flow, lifecycle, carousel, timeline

Slash Commands

For explicit, repeatable workflows use these commands:

  • /platform-skills:debug — structured troubleshooting for any platform symptom
  • /platform-skills:review — production-readiness review of any manifest, Terraform, or workflow
  • /platform-skills:terraform — full fmt/validate/tflint/security pipeline + blast radius review
  • /platform-skills:fluxcd — FluxCD entry point: routes to debug (live cluster issue), audit (repo health check), or helm (chart review) based on your input
  • /platform-skills:gitops debug — Flux CD and Argo CD live cluster troubleshooting (5-workflow structured debug)
  • /platform-skills:gitops audit — Flux CD GitOps repository 6-phase audit (discovery, validation, API compliance, best practices, security, report)
  • /platform-skills:linkerd — Linkerd mTLS, injection, policy, and multi-cluster diagnostics
  • /platform-skills:linux — Linux administration, DNS, load balancing, VPC/VNet, and connectivity troubleshooting
  • /platform-skills:product — product thinking, friction audits, DevEx, RFC/ADR, incident updates, post-mortems
  • /platform-skills:compliance — SOC 2 gap analysis, control implementation, evidence collection, and Checkov remediation for Terraform
  • /platform-skills:helmcheck — Helm chart scaffolding, structural review, and security audit with full lint/validation pipeline
  • /platform-skills:mcp — MCP server/client scaffolding, protocol review, and integration debugging
  • /platform-skills:aws-profile — discover, switch, and validate AWS profiles for MCP servers across VS Code and Claude Code
  • /platform-skills:observability — instrument services, build dashboards, write alerts, run load tests, plan capacity
  • /platform-skills:document — generate docstrings, OpenAPI specs, documentation sites, and getting started guides
  • /platform-skills:datadog — Datadog Agent setup, APM instrumentation, monitors, dashboards, SLOs, pup CLI operations, LLM Observability instrumentation, evaluators, and debugging
  • /platform-skills:dynatrace — OneAgent deployment, instrumentation, anomaly detection, SLOs, and debugging
  • /platform-skills:commit — analyze diff, generate conventional commit message, stage files atomically, validate message
  • /platform-skills:opa — generate Rego policies, write unit tests, run fmt/regal/verify pipeline, explain or debug policies
  • /platform-skills:kyverno — generate, test, audit, debug, or migrate Kyverno CEL-based admission policies
  • /platform-skills:pr-review — comprehensive PR review: cost, drift, ownership, compliance, upgrade, rollback
  • /platform-skills:triage — triage a PR comment (bot or human): classify as ACTIONABLE_FIX / INFORMATIONAL / NOT_APPLICABLE, produce the exact fix if needed, and write the thread reply
  • /platform-skills:keda — design, generate, debug, or review KEDA ScaledObject/ScaledJob autoscaling
  • /platform-skills:self-improve — bootstrap global or project-local .learnings/ workspace (init global/init local), log/review/promote learnings and errors, status overview, and migrate between scopes
  • /platform-skills:supply-chain — sign images, generate and attest SBOMs, run CVE severity gates, enforce image signatures in Kubernetes, and generate SLSA Level 2 provenance
  • /platform-skills:runtime-security — deploy Falco with eBPF, write custom rules, route alerts, debug why a rule is not firing, and bridge Falco signals to Kyverno admission enforcement
  • /platform-skills:chaos — install Litmus Chaos or Chaos Mesh, generate fault experiments, schedule recurring chaos, run structured GameDay, debug stuck experiments, report results
  • /platform-skills:dora — instrument DORA metrics in GitHub Actions, generate Grafana dashboards, benchmark against performance bands, debug missing metric data
  • /platform-skills:awesome-docs — generate any animated Markdown document (README, architecture guide, runbook, tutorial, RFC, post-mortem, or custom), convert existing Markdown to animated, update diagrams, diff for staleness, audit quality, preview locally, or export to Confluence/Notion HTML
  • /platform-skills:aws — CloudFront distributions, WAF web ACLs, Lambda@Edge, CloudFront Functions, Firewall Manager multi-account enforcement, and Terraform module generation with best practices
  • /platform-skills:composite-actions — generate a full composite action repo scaffold, review an existing action.yml, harden with SHA pinning and env isolation, or generate a test workflow
  • /platform-skills:renovate — generate renovate.json for any repo, or emit a GHA workflow to validate it on PR
  • Working Flux CD examples: examples/fluxcd/

BEFORE_AFTER.md

CHANGELOG.md

CODE_OF_CONDUCT.md

COMMANDS.md

CONTRIBUTING.md

EDITOR_INTEGRATIONS.md

GETTING_STARTED.md

HOW_IT_WORKS.md

install.sh

INSTALLATION.md

LAUNCH.md

PROMPTS.md

QUICKSTART.md

README.md

renovate.json

SECURITY.md

SKILL.md

tessl.json

tile.json