Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
67
84%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Configure, troubleshoot, or investigate incidents in Dynatrace.
Deploy the Dynatrace Operator and OneAgent on Kubernetes.
Steps:
kubectl apply -f .../dynatrace-operator/releases/latest/download/kubernetes.yamlapiToken and dataIngestToken (store in Kubernetes Secret or secrets manager — never plain values)DynaKube CR with cloudNativeFullStack for automatic injection — no pod restarts requiredmetadataEnrichment: true for Kubernetes metadata on all telemetrykubectl describe pod <app-pod> | grep dynatraceRequired token scopes:
apiToken: ReadConfig WriteConfig DataExport LogExport ReadSyntheticData WriteAnomalyDetectiondataIngestToken: metrics.ingest logs.ingestReference: references/dynatrace.md → Deployment, Token Scopes
Add custom spans or business transaction tracing to a service.
Steps:
@dynatrace/oneagent-sdk (Node.js), oneagent-sdk (Python), or OneAgent Java SDKx-dynatrace header between servicesReference: references/dynatrace.md → Code-Level Instrumentation
Configure anomaly detection and alerting for a service.
Steps:
dynatrace_service_anomalies_v2 resource for failure rate and response timedynatrace_alerting profile linking anomalies to the teamReference: references/dynatrace.md → Terraform Provider, Davis AI Problem Feeds
Define a Dynatrace SLO.
Steps:
dynatrace_slo_v2 resourcebuiltin:service.errors.server.successCount / builtin:service.requestCount.servertarget_success and target_warning thresholdsReference: references/dynatrace.md → SLOs
Create a Dynatrace dashboard.
Steps:
dynatrace_json_dashboard resource pointing to a JSON dashboard filebuiltin:service.requestCount.server, builtin:service.response.time, builtin:service.errors.server.rateDATA_EXPLORER tile typeReference: references/dynatrace.md → Terraform Provider
Live incident investigation using the Dynatrace MCP server.
Requires the Dynatrace MCP server connected to Claude Code. See setup in references/dynatrace.md → MCP Server Setup.
Cost note: execute_dql queries scan Grail data and may incur costs based on your Dynatrace consumption model. Start with short timeframes (last 1h–24h). Set DT_GRAIL_QUERY_BUDGET_GB to cap session spend.
Ask Claude to run these via the MCP server:
List all open Problems in Dynatrace right now.
Get full details for Problem <problem-id> including root cause and affected entities.
Show me Kubernetes events for namespace production in the last 30 minutes.Use DQL via the MCP to query Grail directly:
Show me error logs for the orders-service in the last hour.
List the top exceptions for orders-service with stack traces.
Find distributed traces with errors for orders-service — show the slowest and most frequent.Example DQL the MCP can generate and execute:
fetch logs
| filter service.name == "orders-service" and loglevel == "ERROR"
| sort timestamp desc
| limit 50
| fields timestamp, content, trace_id, span_idfetch spans
| filter service.name == "orders-service" and status == "ERROR"
| sort timestamp desc
| limit 20
| fields timestamp, span_name, error.message, trace_id, durationLet Davis AI perform automated analysis:
Chat with Davis Copilot: "Why is orders-service having elevated error rates since 14:00 UTC?"
List available Davis analyzers for root cause analysis.
Find the entity named "orders-service" and show its current health.Create a Dynatrace notebook summarising the orders-service incident with the DQL queries we ran.
Send a Slack message to #incidents: "orders-service error rate normalising, fix deployed at <time>."
Close Problem <problem-id> with message "Root cause: bad deploy at 14:05 UTC, rolled back at 14:22 UTC."Reference: references/dynatrace.md → MCP Server Setup, Incident Investigation Workflow
Diagnose Dynatrace data gaps or injection failures without the MCP server.
Classify the failure:
Evidence to collect:
# Operator and DynaKube status
kubectl -n dynatrace get dynakube
kubectl -n dynatrace get pods
# Check injection on app pod
kubectl describe pod <app-pod> | grep -i dynatrace
# Verify MINT ingestion (custom metrics)
curl "https://{env}.live.dynatrace.com/api/v2/metrics/query?metricSelector=<your.metric>" \
-H "Authorization: Api-Token ${DT_API_TOKEN}"
# List open Davis AI problems
curl "https://{env}.live.dynatrace.com/api/v2/problems?problemSelector=status(OPEN)" \
-H "Authorization: Api-Token ${DT_API_TOKEN}"Provide: symptom → root cause hypothesis → evidence command → fix → validation step
.claude-plugin
.github
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
supply-chain
terraform
references
scripts
skills
platform-skills
tests