Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
67
84%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Design, debug, and review KEDA (Kubernetes Event-Driven Autoscaling) ScaledObject and ScaledJob resources.
Write a production-ready ScaledObject or ScaledJob from a description.
Steps:
ScaledObjectScaledJobapiVersion: keda.sh/v1alpha1scaleTargetRef.name matching the exact Deployment/StatefulSet/Job nameminReplicaCount: 0 only if the service can tolerate cold-start latency; otherwise minReplicaCount: 1pollingInterval and cooldownPeriod sized to the workload pattern (see table in references/keda.md)metadata with activationThreshold/activationQueueLength set to avoid flapping on sparse eventsauthenticationRef pointing to a TriggerAuthentication — never inline credentials in the ScaledObjectpodIdentity.provider: aws) over static key/secret pairsTriggerAuthentication or ClusterTriggerAuthenticationkubectl describe scaledobject <name> -n <ns> and the expected Active: true conditionReference: references/keda.md → ScaledObject, ScaledJob, TriggerAuthentication
Diagnose why a ScaledObject or ScaledJob is not scaling as expected.
Steps:
kubectl get scaledobject -n <namespace>
kubectl describe scaledobject <name> -n <namespace>
kubectl get hpa -n <namespace>
kubectl describe hpa keda-hpa-<scaledobject-name> -n <namespace>
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator --tail=100
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator-metrics-apiserver --tail=100Conditions block:
Active: false → scaler is not detecting events (check trigger config and auth)Ready: false → KEDA cannot reach the target deployment (check scaleTargetRef.name)Fallback: true → metric fetch is failing; KEDA is using fallback replicas<unknown> targets — indicates metrics adapter cannot reach the external sourceactivationThreshold too high for current event volume?minReplicaCount preventing scale-to-zero?cooldownPeriod not yet elapsed?Reference: references/keda.md → Troubleshooting
Review an existing ScaledObject or ScaledJob for correctness, security, and operational safety.
Review in this priority order:
Correctness
scaleTargetRef.name match an existing Deployment/StatefulSet?minReplicaCount and maxReplicaCount consistent with the workload SLA?pollingInterval appropriate for the trigger type (SQS charges per API call)?metadata use the right field names for this scaler version?Security
Operational safety
minReplicaCount: 0 justified? What is the cold-start latency?cooldownPeriod long enough to prevent thrashing?advanced.restoreToOriginalReplicaCount: true set if the ScaledObject may be deleted?activationThreshold/activationQueueLength set to prevent premature activation on sparse events?scaleDown.stabilizationWindowSeconds configured for latency-sensitive services?GitOps safety
Separate findings into:
Reference: references/keda.md → Security Patterns, Scaling Lifecycle
Design the scaling strategy for a workload from requirements.
Steps:
references/keda.mdpollingInterval, cooldownPeriod, and minReplicaCount based on the workload patternReference: references/keda.md → KEDA vs HPA decision matrix, Scalers
.claude-plugin
.github
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
supply-chain
terraform
references
scripts
skills
platform-skills
tests