Kubernetes clusters, pods, nodes, workloads, storage, networking, and resource relationships. Query K8s inventory, diagnose degraded deployments and pod failures, investigate rollouts, audit ingress and network policies.
65
77%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/dt-obs-kubernetes/SKILL.mdMonitor and analyze Kubernetes infrastructure using Dynatrace DQL. Query cluster resources, monitor workload health, analyze pod placement, optimize costs, and assess security posture.
| File | Contents |
|---|---|
references/cluster-inventory.md | Clusters, namespaces, resource distribution |
references/labels-annotations.md | Labels, annotations, k8s.object parsing patterns |
references/pod-node-placement.md | Node selectors, affinity, taints, HA scheduling |
references/pod-debugging.md | Exit codes, pod conditions, init containers, image pull errors, logs, service→pod drill-down |
references/workload-health.md | Degraded deployments, stuck rollouts, node conditions, CPU throttling, HPA, StatefulSet ordering |
references/pv-pvc.md | PVC/PV lifecycle, phase reference, orphaned volumes, StorageClass |
references/ingress.md | Routing rule parsing, TLS audit |
references/network-policies.md | Policy listing, namespace isolation audit |
Workloads: K8S_DEPLOYMENT, K8S_STATEFULSET, K8S_DAEMONSET,
K8S_JOB, K8S_CRONJOB, K8S_HORIZONTALPODAUTOSCALER
Infrastructure: K8S_CLUSTER, K8S_NAMESPACE, K8S_NODE, K8S_POD
Configuration: K8S_SERVICE, K8S_CONFIGMAP, K8S_SECRET,
K8S_PERSISTENTVOLUMECLAIM, K8S_PERSISTENTVOLUME, K8S_INGRESS,
K8S_NETWORKPOLICY
smartscapeNodes - Query K8s entities:
smartscapeNodes K8S_POD
| filter k8s.namespace.name == "production"
| fields k8s.cluster.name, k8s.pod.nametimeseries - Monitor metrics over time:
timeseries cpu = sum(dt.kubernetes.container.cpu_usage),
by: {k8s.pod.name, k8s.namespace.name}
| fieldsAdd avg_cpu = arrayAvg(cpu)fetch logs - Analyze log events:
fetch logs
| filter k8s.namespace.name == "production" and loglevel == "ERROR"k8s.cluster.name, k8s.namespace.name, k8s.pod.name, k8s.node.namek8s.workload.name, k8s.workload.kind, k8s.container.namek8s.object - Full JSON configuration for deep inspectiontags[label] - Access labels and annotationsCPU: dt.kubernetes.container.cpu_usage, cpu_throttled, limits_cpu,
requests_cpu
Memory: dt.kubernetes.container.memory_working_set, limits_memory,
requests_memory
Operations: dt.kubernetes.container.restarts, oom_kills
Node: dt.kubernetes.node.pods_allocatable, cpu_allocatable,
memory_allocatable, dt.kubernetes.pods
K8S_POD vs CONTAINER: these are different entity types in Dynatrace.
K8S_POD — K8s-native entities with k8s.object JSON, scheduling state, conditions, and K8s metrics. Use this skill.CONTAINER — Host-level container inventory (image, lifetime, host assignment). Use dt-obs-hosts skill instead.The smartscape edge is CONTAINER --(is_part_of)--> K8S_POD. To reach containers from a pod, traverse backward:
smartscapeNodes K8S_POD
| filter k8s.namespace.name == "<namespace>"
| traverse edgeTypes: {is_part_of}, targetTypes: {CONTAINER}, direction: backward, fieldsKeep: {id}
| fields k8s.cluster.name, k8s.namespace.name, k8s.pod.name, container.id=idNo direct smartscape edge exists between SERVICE and K8S_POD. The correlation key is the shared dimension k8s.workload.name. See Service → Pod Drill-Down in references/pod-debugging.md for the full two-step pattern.
List all clusters:
smartscapeNodes K8S_CLUSTER
| fields k8s.cluster.name, k8s.cluster.version, k8s.cluster.distributionCheck node capacity:
timeseries {
current_pods = avg(dt.kubernetes.pods),
max_pods = avg(dt.kubernetes.node.pods_allocatable)
}, by: {k8s.node.name, k8s.cluster.name}
| fieldsAdd pod_capacity_pct = (arrayAvg(current_pods) / arrayAvg(max_pods)) * 100
| filter pod_capacity_pct > 80Identify pods in non-Running state:
smartscapeNodes K8S_POD
| parse k8s.object, "JSON:config"
| fieldsAdd phase = config[status][phase]
| filter phase != "Running"
| fields k8s.cluster.name, k8s.namespace.name, k8s.pod.name, phaseFind over-provisioned pods (usage < 30%):
timeseries {
cpu_usage = sum(dt.kubernetes.container.cpu_usage),
cpu_requests = avg(dt.kubernetes.container.requests_cpu)
}, by: {k8s.pod.name, k8s.namespace.name, k8s.cluster.name}
| fieldsAdd usage_pct = (arrayAvg(cpu_usage) / arrayAvg(cpu_requests)) * 100
| filter usage_pct < 30 and arrayAvg(cpu_requests) > 0Identify containers without limits:
smartscapeNodes K8S_POD
| parse k8s.object, "JSON:config"
| expand container = config[spec][containers]
| fieldsAdd
container_name = container[name],
cpu_limit = container[resources][limits][cpu],
memory_limit = container[resources][limits][memory]
| filter isNull(cpu_limit) or isNull(memory_limit)Find pods with OOMKills:
timeseries oom_kills = sum(dt.kubernetes.container.oom_kills),
by: {k8s.pod.name, k8s.namespace.name, k8s.cluster.name}
| filter arraySum(oom_kills) > 0
| fieldsAdd total_oom_kills = arraySum(oom_kills)
| sort total_oom_kills descAnalyze pod restart patterns:
timeseries restarts = sum(dt.kubernetes.container.restarts),
by: {k8s.pod.name, k8s.namespace.name, k8s.cluster.name}
| fieldsAdd total_restarts = arraySum(restarts)
| filter total_restarts > 5Identify privileged containers:
smartscapeNodes K8S_POD
| parse k8s.object, "JSON:config"
| expand container = config[spec][containers]
| fieldsAdd
container_name = container[name],
privileged = container[securityContext][privileged]
| filter privileged == trueFind containers running as root:
smartscapeNodes K8S_POD
| parse k8s.object, "JSON:config"
| expand container = config[spec][containers]
| fieldsAdd
container_name = container[name],
run_as_user = container[securityContext][runAsUser],
run_as_non_root = container[securityContext][runAsNonRoot]
| filter (isNull(run_as_user) or run_as_user == 0) and run_as_non_root != trueVerify pod distribution (HA compliance):
smartscapeNodes K8S_POD
| filter k8s.workload.kind == "deployment"
| summarize pod_count = count(),
node_count = countDistinct(k8s.node.name),
by: {k8s.cluster.name, k8s.namespace.name, k8s.workload.name}
| fieldsAdd ha_compliant = node_count > 1
| filter pod_count >= 2 and not ha_compliantFind active DAVIS problems affecting K8s entities:
fetch dt.davis.problems, from:now() - 2h
| filter not(dt.davis.is_duplicate) and event.status == "ACTIVE"
| filter matchesPhrase(smartscape.affected_entity.types, "K8S_")
| fields display_id, event.name, event.category, smartscape.affected_entity.idsUse entries smartscape.affected_entity.ids (array of Smartscape IDs) to look up the affected entity using its Smartscape ID.
limit for explorationUnavailable Metrics:
Query Considerations:
k8s.object field if not necessary4991356
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.