Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
64
80%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Status: Stable
Working NodePool, EC2NodeClass, and validation examples for Karpenter v1.x on Amazon EKS.
| File | Purpose |
|---|---|
| nodepool-default-al2023.yaml | General-purpose NodePool — AL2023, mixed Spot/On-Demand, multi-AZ, Graviton included |
| nodepool-spot-flex.yaml | Spot-optimised NodePool — broad instance families, batch workloads, high weight |
| nodepool-critical-ondemand.yaml | On-Demand NodePool for SLA-bound workloads — pinned AMI, conservative disruption, PDB included |
| nodepool-gpu.yaml | GPU NodePool — Bottlerocket AMI, g4dn/g5/p3 families, no disruption |
| ec2nodeclass-private-cluster.yaml | Private cluster EC2NodeClass — explicit API endpoint, CA, and service CIDR in AL2023 userData |
| karpenter-validate.sh | Validation script — offline field checks + kubectl dry-run + live cluster health |
# Install Karpenter (OCI chart — not the deprecated charts.karpenter.sh repo)
helm upgrade --install karpenter \
oci://public.ecr.aws/karpenter/karpenter \
--version "1.12.1" \
--namespace karpenter \
--create-namespace \
--set "settings.clusterName=my-cluster" \
--set "settings.interruptionQueue=karpenter-my-cluster" \
--set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=<controller-role-arn>" \
--wait
# Apply a NodePool and EC2NodeClass
kubectl apply -f nodepool-default-al2023.yaml
# Check status
kubectl get nodepool
kubectl describe nodepool default
# Validate examples
bash karpenter-validate.shNodePool selection combines weight and taint/toleration matching:
spot-flex weight: 100 Spot + On-Demand, batch — requires spot-flex toleration (NoSchedule taint)
default weight: 10 Mixed Spot/On-Demand — matches most pods with no special constraints
critical-ondemand weight: 5 On-Demand only, SLA — opt-in via nodeSelector: karpenter.sh/capacity-type: on-demand
gpu weight: 5 GPU only — requires nvidia.com/gpu toleration (NoSchedule taint)Karpenter selects the highest-weight NodePool whose requirements and taints the pod satisfies:
spot-flex — only reachable by pods with a spot-flex tolerationdefault — untainted mixed Spot/On-Demand pool; matches most general pods with no special constraintscritical-ondemand — weight 5 (below default); only selected when a pod explicitly sets nodeSelector: { karpenter.sh/capacity-type: on-demand }gpu — only reachable by pods with a nvidia.com/gpu tolerationAll examples use an EC2 instance profile (instanceProfile: karpenter-node-profile). The Karpenter controller itself uses either EKS Pod Identity (recommended) or IRSA for its own API calls.
See references/karpenter.md for the full IAM policy, interruption queue setup, and Pod Identity vs IRSA comparison.
.claude-plugin
.github
assets
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
karpenter
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
setup-agents
terraform
references
scripts
skills
platform-skills
tests