Comprehensive Kubernetes toolkit for YAML generation, validation, and cluster debugging
94
94%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
When facing any Kubernetes issue, follow this systematic approach:
Kubernetes issues typically fall into these categories:
Application Layer → Application crashes, errors, bugs
Pod Layer → Pod not starting, restarting, pending
Service Layer → Network connectivity, DNS issues
Node Layer → Node not ready, resource exhaustion
Cluster Layer → Control plane issues, API problems
Storage Layer → Volume mount failures, PVC issues
Configuration Layer → ConfigMap, Secret, RBAC issues# What's the current state?
kubectl get pods -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Quick status check
kubectl describe pod <pod-name> -n <namespace>Follow the appropriate workflow based on pod state:
1. kubectl describe pod → Check events section
↓
2. Check scheduling issues:
- Insufficient resources? → kubectl top nodes
- Node selector issues? → Check nodeSelector in pod spec
- Taints/tolerations? → kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
- PVC pending? → kubectl get pvc -n <namespace>
↓
3. Take action:
- Add nodes or free resources
- Adjust node selector
- Add tolerations
- Fix PVC/PV binding1. kubectl logs <pod> --previous
↓
2. Analyze crash reason:
- Application error? → Fix code/config
- Missing dependencies? → Check env vars, volumes, secrets
- Resource limits? → kubectl describe pod → Check OOMKilled
- Failed health checks? → Check liveness/readiness probe settings
↓
3. Common checks:
kubectl get pod <pod> -o yaml | grep -A 10 env
kubectl get pod <pod> -o yaml | grep -A 10 volumeMounts
kubectl get pod <pod> -o yaml | grep -A 10 livenessProbe
↓
4. Fix and verify:
- Update deployment/pod spec
- kubectl apply -f updated-config.yaml
- Watch: kubectl get pods -w1. kubectl describe pod → Find exact error
↓
2. Verify image:
- Does image exist? → docker pull <image> (test locally)
- Correct tag? → Check deployment spec
- Private registry? → Check imagePullSecrets
↓
3. Fix authentication (if needed):
kubectl create secret docker-registry <secret> \
--docker-server=<server> \
--docker-username=<user> \
--docker-password=<pass>
↓
4. Update pod spec with imagePullSecrets
↓
5. Verify:
kubectl get pods -w1. Verify service exists:
kubectl get svc <service-name> -n <namespace>
↓
2. Check endpoints:
kubectl get endpoints <service-name> -n <namespace>
↓
No endpoints? → Check selector matches pod labels
↓
3. Test DNS resolution:
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
nslookup <service-name>.<namespace>.svc.cluster.local
↓
DNS fails? → Check CoreDNS pods and logs
↓
4. Test connectivity:
curl <service-name>.<namespace>.svc.cluster.local:<port>
↓
Connection fails? → Check:
- Network policies: kubectl get networkpolicies -n <namespace>
- Target port matches pod port
- Pod is ready: kubectl get pods -n <namespace>
↓
5. Check from outside cluster (if applicable):
- LoadBalancer service? → Check external IP assigned
- Ingress? → kubectl get ingress -n <namespace>
- NodePort? → Access via <node-ip>:<nodePort>1. Test DNS from problem pod:
kubectl exec <pod> -n <namespace> -- nslookup kubernetes.default
↓
2. Check CoreDNS health:
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
↓
3. Verify DNS service:
kubectl get svc -n kube-system kube-dns
kubectl get endpoints -n kube-system kube-dns
↓
4. Check pod DNS config:
kubectl exec <pod> -n <namespace> -- cat /etc/resolv.conf
↓
5. Fix if needed:
- Restart CoreDNS: kubectl rollout restart -n kube-system deployment/coredns
- Check network policies allow DNS (port 53)
- Verify kubelet configuration1. Identify resource hog:
kubectl top nodes
kubectl top pods --all-namespaces
↓
2. Check specific pod:
kubectl top pod <pod-name> -n <namespace> --containers
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Limits"
↓
3. Analyze application:
- Memory leak? → Check logs for errors
- CPU spike? → Profile application
- Check resource requests/limits appropriate?
↓
4. Take action:
- Increase limits if legitimate usage
- Fix application if bug/leak
- Implement HPA if scaling needed
- Add resource quotas to prevent overconsumption1. Check node status:
kubectl get nodes
kubectl describe node <node-name>
↓
2. Look for pressure conditions:
- MemoryPressure
- DiskPressure
- PIDPressure
↓
3. Check node resources:
kubectl top node <node-name>
↓
4. Find resource consumers:
kubectl describe node <node-name> | grep -A 20 "Allocated resources"
↓
5. Actions:
- Evict non-critical pods
- Add more nodes
- Adjust resource requests/limits
- Clean up disk space if DiskPressure1. Check PVC status:
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
↓
2. Check for matching PV:
kubectl get pv
↓
No matching PV? → Check:
- Storage class exists: kubectl get storageclass
- Dynamic provisioner working
- Manual PV needed?
↓
3. Verify storage class:
kubectl describe storageclass <class-name>
↓
4. Check provisioner logs (if dynamic):
kubectl logs -n kube-system <provisioner-pod>
↓
5. Fix:
- Create matching PV (static)
- Fix storage class configuration (dynamic)
- Verify provisioner is running1. Check rollout status:
kubectl rollout status deployment/<name> -n <namespace>
↓
2. Check replica sets:
kubectl get rs -n <namespace>
kubectl describe rs <new-replicaset> -n <namespace>
↓
3. Check new pod status:
kubectl get pods -n <namespace> -l app=<app-label>
↓
Pods failing? → Follow pod troubleshooting workflow
↓
4. Check rollout strategy:
kubectl get deployment <name> -n <namespace> -o yaml | grep -A 10 strategy
↓
5. Options:
- Fix pod issues and rollout will continue
- Pause rollout: kubectl rollout pause deployment/<name>
- Rollback: kubectl rollout undo deployment/<name>
- Check revision history: kubectl rollout history deployment/<name># Pod debugging
kubectl get pods -n <namespace> -o wide
kubectl describe pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace> [-c container]
kubectl logs <pod> -n <namespace> --previous
kubectl exec <pod> -n <namespace> -it -- /bin/sh
# Service debugging
kubectl get svc -n <namespace>
kubectl get endpoints -n <namespace>
kubectl describe svc <service> -n <namespace>
# Events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Resource usage
kubectl top nodes
kubectl top pods -n <namespace>
# Network debugging
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
# Cluster health
kubectl get nodes
kubectl cluster-info
kubectl get componentstatuses# Delete stuck pod
kubectl delete pod <pod> -n <namespace> --force --grace-period=0
# Restart deployment
kubectl rollout restart deployment/<name> -n <namespace>
# Rollback deployment
kubectl rollout undo deployment/<name> -n <namespace>
# Cordon node (prevent new pods)
kubectl cordon <node-name>
# Drain node (evict pods)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data