Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.
67
84%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Status: Stable
Production-ready Datadog monitors, dashboards, and SLOs managed as Terraform code.
| Example | Type | Description |
|---|---|---|
| terraform/monitors.tf | Terraform | Error rate monitor, p99 latency monitor, and 30-day availability SLO |
| llm-observability/llmobs-python.py | Python | LLMObs instrumentation with @llm, @workflow, @retrieval decorators and faithfulness evaluation |
| llm-observability/llmobs-nodejs.js | Node.js | LLMObs instrumentation with llmobs.trace() and evaluation submission |
| llm-observability/evaluator-bootstrap.py | Python | Faithfulness and quality evaluator stubs generated from production trace patterns |
# Export credentials (never hardcode)
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"
cd terraform
terraform init
terraform plan
terraform apply| Resource | Type | Threshold |
|---|---|---|
orders-service high error rate | Metric alert | Critical: > 5%, Warning: > 2% |
orders-service p99 latency high | Metric alert | Critical: > 1s, Warning: > 0.5s |
Orders Service Availability | SLO (metric) | Target: 99.9%, Warning: 99.95% over 30d |
Always set these three tags consistently across pods, traces, logs, and monitors:
# Pod environment variables
env:
- name: DD_ENV
value: "production"
- name: DD_SERVICE
value: "orders-service"
- name: DD_VERSION
valueFrom:
fieldRef:
fieldPath: metadata.annotations['app.kubernetes.io/version']--set apiKey)# ✅ Create namespace + secret (idempotent)
kubectl create namespace datadog --dry-run=client -o yaml | kubectl apply -f -
kubectl create secret generic datadog-secret \
--from-literal=api-key="${DD_API_KEY}" \
-n datadog \
--dry-run=client -o yaml | kubectl apply -f -
# ✅ Reference secret in Helm values
helm upgrade --install datadog datadog/datadog \
--set datadog.apiKeyExistingSecret=datadog-secret \
--create-namespace \
-n datadog
# ❌ Never pass key on command line — stored in Helm release history
# helm upgrade --install datadog datadog/datadog --set datadog.apiKey="${DD_API_KEY}"// dd-trace init — must be first import
import tracer from "dd-trace";
tracer.init({
service: "orders-service",
env: process.env.DD_ENV,
logInjection: true, // injects trace_id and span_id into log lines
});Replace orders-service in monitors.tf with your service name:
sed -i 's/orders-service/my-service/g' terraform/monitors.tfDD_ENV, DD_SERVICE, DD_VERSION on every podlogInjection: true in tracer init — enables log/trace correlation@pagerduty-*, @slack-*)/platform-skills:datadog — setup, instrument, monitor, dashboard, SLO, investigate incidents.claude-plugin
.github
commands
docs
examples
agent-self-improve
argocd
awesome-docs
aws
cloudfront
functions
lambda-edge
functions
azure
compliance
conventional-commits
datadog
llm-observability
demo
documentation
dora
dynatrace
fluxcd
github-actions
composite-actions
configure-cloud
db-migrate
docker-build-push
k8s-deploy
notify-slack
pr-comment
release-tag
security-scan
setup-env
setup-terraform
terraform-plan
helm
web-service
templates
kubernetes
kyverno
mcp
observability
openshift
pr-review
ownership
runtime-security
supply-chain
terraform
references
scripts
skills
platform-skills
tests