CtrlK
BlogDocsLog inGet started
Tessl Logo

dt-obs-hosts

Host and process metrics including CPU, memory, disk, network, containers, and process-level telemetry. Monitor infrastructure health and resource utilization.

49

Quality

52%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/dt-obs-hosts/SKILL.md
SKILL.md
Quality
Evals
Security

Infrastructure Hosts Skill

Monitor and manage host and process infrastructure including CPU, memory, disk, network, and technology inventory.

What This Skill Does

  • Discover and inventory hosts across cloud and on-premise environments
  • Monitor host resource utilization (CPU, memory, disk, network)
  • Track process resource consumption and lifecycle
  • Analyze container and Kubernetes infrastructure
  • Discover services via listening ports
  • Manage technology stack versions and compliance
  • Attribute infrastructure costs by cost center and product
  • Validate data quality and metadata completeness
  • Plan capacity and detect resource saturation
  • Correlate infrastructure health across layers

When to Use This Skill

Use this skill when the user needs to:

  • Inventory: "Show me all Linux hosts in AWS us-east-1"
  • Monitor: "What hosts have high CPU usage?"
  • Troubleshoot: "Which processes are consuming the most memory?"
  • Discover: "What databases are running in production?"
  • Plan: "Track Kubernetes version distribution for upgrade planning"
  • Cost: "Calculate infrastructure costs by cost center"
  • Security: "Find all processes listening on port 22"
  • Compliance: "Identify hosts running EOL Java versions"
  • Quality: "Check data completeness for AWS hosts"
  • Optimize: "Find rightsizing candidates based on utilization"

Core Concepts

Entities

  • HOST - Physical or virtual machines (cloud or on-premise)
  • PROCESS - Running processes and process groups
  • CONTAINER - Kubernetes containers
  • NETWORK_INTERFACE - Host network interfaces
  • DISK - Host disk volumes

Metrics Categories

  1. Host Metrics - dt.host.cpu.*, dt.host.memory.*, dt.host.disk.*, dt.host.net.*
  2. Process Metrics - dt.process.cpu.*, dt.process.memory.*, dt.process.io.*, dt.process.network.*
  3. Inventory - OS type, cloud provider, technology stack, versions
  4. Cost - dt.cost.costcenter, dt.cost.product
  5. Quality - Metadata completeness, version compliance

Alert Thresholds

  • CPU/Memory/Disk: 80% warning, 90% critical
  • Network: >70% high, >85% saturated
  • Disk Latency: >20ms bottleneck
  • Network Errors: Drop rate >1%, error rate >0.1%
  • Swap: >30% warning, >50% critical

Key Workflows

1. Host Discovery and Classification

Discover hosts, classify by OS/cloud, inventory resources.

smartscapeNodes "HOST"
| fieldsAdd os.type, cloud.provider, host.logical.cpu.cores, host.physical.memory
| summarize host_count = count(), by: {os.type, cloud.provider}
| sort host_count desc

OS Types: LINUX, WINDOWS, AIX, SOLARIS, ZOS

→ For cloud-specific attributes, see references/inventory-discovery.md

2. Resource Utilization Monitoring

Monitor CPU, memory, disk, network across hosts.

timeseries {
  cpu = avg(dt.host.cpu.usage),
  memory = avg(dt.host.memory.usage),
  disk = avg(dt.host.disk.used.percent)
}, by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| filter arrayAvg(cpu) > 80 or arrayAvg(memory) > 80
| sort arrayAvg(cpu) desc

High utilization threshold: 80% warning, 90% critical

→ For detailed CPU analysis, see references/host-metrics.md
→ For memory breakdown, see references/host-metrics.md

3. Process Resource Analysis

Identify top resource consumers at process level.

timeseries {
  cpu = avg(dt.process.cpu.usage),
  memory = avg(dt.process.memory.usage)
}, by: {dt.smartscape.process}
| fieldsAdd process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(cpu) > 50
| sort arrayAvg(cpu) desc
| limit 20

→ For process I/O analysis, see references/process-monitoring.md
→ For process network metrics, see references/process-monitoring.md

4. Technology Stack Inventory

Discover and track software technologies and versions.

smartscapeNodes "PROCESS"
| fieldsAdd process.software_technologies
| expand tech = process.software_technologies
| fieldsAdd tech_type = tech[type], tech_version = tech[version]
| summarize process_count = count(), by: {tech_type, tech_version}
| sort process_count desc

Common Technologies: Java, Node.js, Python, .NET, databases, web servers, messaging systems

→ For version compliance checks, see references/inventory-discovery.md

5. Service Discovery via Ports

Map listening ports to services for security and inventory.

smartscapeNodes "PROCESS"
| fieldsAdd process.listen_ports, dt.process_group.detected_name
| filter isNotNull(process.listen_ports) and arraySize(process.listen_ports) > 0
| expand port = process.listen_ports
| summarize process_count = count(), by: {port, dt.process_group.detected_name}
| sort toLong(port) asc
| limit 50

Well-known ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 3306 (MySQL), 5432 (PostgreSQL)

→ For comprehensive port mapping, see references/inventory-discovery.md

6. Container and Kubernetes Monitoring

Track container distribution and K8s workload types.

smartscapeNodes "CONTAINER"
| fieldsAdd k8s.cluster.name, k8s.namespace.name, k8s.workload.kind
| summarize container_count = count(), by: {k8s.cluster.name, k8s.workload.kind}
| sort k8s.cluster.name, container_count desc

Workload Types: deployment, daemonset, statefulset, job, cronjob

Note: Container image names/versions NOT available in smartscape.

→ For K8s version tracking, see references/container-monitoring.md
→ For container lifecycle, see references/container-monitoring.md

7. Cost Attribution and Chargeback

Calculate infrastructure costs by cost center.

smartscapeNodes "HOST"
| fieldsAdd dt.cost.costcenter, host.logical.cpu.cores, host.physical.memory
| filter isNotNull(dt.cost.costcenter)
| fieldsAdd memory_gb = toDouble(host.physical.memory) / 1024 / 1024 / 1024
| summarize 
    host_count = count(),
    total_cores = sum(toLong(host.logical.cpu.cores)),
    total_memory_gb = sum(memory_gb),
    by: {dt.cost.costcenter}
| sort total_cores desc

→ For product-level cost tracking, see references/inventory-discovery.md

8. Infrastructure Health Correlation

Correlate host and process metrics for cross-layer analysis.

timeseries {
  host_cpu = avg(dt.host.cpu.usage),
  host_memory = avg(dt.host.memory.usage),
  process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
| fieldsAdd
    host_name = getNodeName(dt.smartscape.host),
    process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(host_cpu) > 70
| sort arrayAvg(host_cpu) desc

Health scoring: Critical if any resource >90%, warning if >80%

→ For multi-resource saturation detection, see references/host-metrics.md


Common Query Patterns

Pattern 1: Smartscape Discovery

Use smartscapeNodes to discover and classify entities.

smartscapeNodes "HOST"
| fieldsAdd <attributes>
| filter <conditions>
| summarize <aggregations>

Pattern 2: Timeseries Performance

Use timeseries to analyze metrics over time.

timeseries metric = avg(dt.host.<metric>), by: {dt.smartscape.host}
| fieldsAdd <calculations>
| filter <thresholds>

Pattern 3: Cross-Layer Correlation

Correlate host and process metrics.

timeseries {
  host_cpu = avg(dt.host.cpu.usage),
  process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}

Pattern 4: Entity Enrichment with Lookup

Enrich data with entity attributes. After lookup, reference fields with lookup. prefix.

timeseries cpu = avg(dt.host.cpu.usage), by: {dt.smartscape.host}
| lookup [
    smartscapeNodes HOST
    | fields id, cpuCores, memoryTotal
  ], sourceField:dt.smartscape.host, lookupField:id
| fieldsAdd cores = lookup.cpuCores, mem_gb = lookup.memoryTotal / 1024 / 1024 / 1024

Tags and Metadata

Important Notes

  • Generic tags field is NOT populated in smartscape queries
  • Use specific tag fields: tags:azure[*], tags:environment
  • Use custom metadata: host.custom.metadata[*]

Available Tags

  • Azure Tags: tags:azure[dt_owner_team], tags:azure[dt_cloudcost_capability]
  • Environment: tags:environment
  • Custom Metadata: host.custom.metadata[OperatorVersion], host.custom.metadata[Cluster]
  • Cost: dt.cost.costcenter, dt.cost.product

→ For complete tag reference, see references/inventory-discovery.md


Cloud-Specific Attributes

AWS

  • cloud.provider == "aws"
  • aws.region, aws.availability_zone, aws.account.id
  • aws.resource.id, aws.resource.name
  • aws.state (running, stopped, terminated)

Azure

  • cloud.provider == "azure"
  • azure.location, azure.subscription, azure.resource.group
  • azure.status, azure.provisioning_state
  • azure.resource.sku.name (VM size)

Kubernetes

  • k8s.cluster.name, k8s.cluster.uid
  • k8s.namespace.name, k8s.node.name, k8s.pod.name
  • k8s.workload.name, k8s.workload.kind

→ For multi-cloud analysis, see references/inventory-discovery.md


Best Practices

Alerting

  1. Use percentiles (p95, p99) for latency metrics
  2. Use max() for resource limits
  3. Use avg() for utilization trends
  4. Set multi-level thresholds (warning at 80%, critical at 90%)

Time Windows

  • Real-time: 5-15 minute windows
  • Trends: 24 hours to 7 days
  • Capacity planning: 30-90 days

Query Optimization

  1. Use filters early in the pipeline
  2. Limit results with | limit N
  3. Use specific entity types in smartscapeNodes
  4. Aggregate before enrichment (lookup)

Data Quality

  1. Validate metadata completeness (target >90%)
  2. Check for duplicate host names
  3. Ensure cost tag coverage
  4. Monitor data freshness (lifetime.end)

Limitations and Notes

Smartscape Limitations

  • Container image names/versions NOT available in smartscape
  • Generic tags field NOT populated (use specific tag namespaces)
  • Process metadata varies by process type

Platform-Specific

  • dt.host.cpu.iowait available on Linux only
  • AIX has specific CPU metrics (entitlement, physc)
  • Inode metrics available on Linux only

Best Practices

  • Use getNodeName() to get human-readable names
  • Convert bytes to GB for readability: / 1024 / 1024 / 1024
  • Round aggregated values: round(value, decimals: 1)
  • Use isNotNull() checks before array operations

When to Load References

This skill uses progressive disclosure. Start here for 80% of use cases. Load reference files for detailed specifications when needed.

Load host-metrics.md when:

  • Analyzing CPU component breakdown (user, system, iowait, steal)
  • Investigating memory pressure and swap usage
  • Troubleshooting disk I/O latency
  • Diagnosing network packet drops or errors

Load process-monitoring.md when:

  • Analyzing process-level I/O patterns
  • Investigating TCP connection quality
  • Detecting resource exhaustion (file descriptors, threads)
  • Tracking GC suspension time

Load container-monitoring.md when:

  • Analyzing container lifecycle and churn
  • Tracking Kubernetes version distribution
  • Managing OneAgent operator versions
  • Planning K8s cluster upgrades

Load inventory-discovery.md when:

  • Performing security audits via port discovery
  • Implementing cost attribution and chargeback
  • Validating data quality and metadata completeness
  • Managing multi-cloud infrastructure

References

  • host-metrics.md - Detailed host CPU, memory, disk, and network monitoring
  • process-monitoring.md - Process-level CPU, memory, I/O, and network analysis
  • container-monitoring.md - Container inventory, Kubernetes versions, and operator management
  • inventory-discovery.md - Host/process discovery, technology inventory, cost attribution, and data quality

Repository
Dynatrace/dynatrace-for-ai
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.