CtrlK
BlogDocsLog inGet started
Tessl Logo

nitinjain999/platform-skills

Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.

67

Quality

84%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

README.mdexamples/datadog/

Status: Stable

Datadog Examples

Production-ready Datadog monitors, dashboards, and SLOs managed as Terraform code.

Examples

ExampleTypeDescription
terraform/monitors.tfTerraformError rate monitor, p99 latency monitor, and 30-day availability SLO
llm-observability/llmobs-python.pyPythonLLMObs instrumentation with @llm, @workflow, @retrieval decorators and faithfulness evaluation
llm-observability/llmobs-nodejs.jsNode.jsLLMObs instrumentation with llmobs.trace() and evaluation submission
llm-observability/evaluator-bootstrap.pyPythonFaithfulness and quality evaluator stubs generated from production trace patterns

Quick Start

# Export credentials (never hardcode)
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"

cd terraform
terraform init
terraform plan
terraform apply

What terraform/monitors.tf Creates

ResourceTypeThreshold
orders-service high error rateMetric alertCritical: > 5%, Warning: > 2%
orders-service p99 latency highMetric alertCritical: > 1s, Warning: > 0.5s
Orders Service AvailabilitySLO (metric)Target: 99.9%, Warning: 99.95% over 30d

Key Patterns

Unified Service Tagging (required on all Datadog resources)

Always set these three tags consistently across pods, traces, logs, and monitors:

# Pod environment variables
env:
  - name: DD_ENV
    value: "production"
  - name: DD_SERVICE
    value: "orders-service"
  - name: DD_VERSION
    valueFrom:
      fieldRef:
        fieldPath: metadata.annotations['app.kubernetes.io/version']

Secure Agent installation (no --set apiKey)

# ✅ Create namespace + secret (idempotent)
kubectl create namespace datadog --dry-run=client -o yaml | kubectl apply -f -
kubectl create secret generic datadog-secret \
  --from-literal=api-key="${DD_API_KEY}" \
  -n datadog \
  --dry-run=client -o yaml | kubectl apply -f -

# ✅ Reference secret in Helm values
helm upgrade --install datadog datadog/datadog \
  --set datadog.apiKeyExistingSecret=datadog-secret \
  --create-namespace \
  -n datadog

# ❌ Never pass key on command line — stored in Helm release history
# helm upgrade --install datadog datadog/datadog --set datadog.apiKey="${DD_API_KEY}"

Log + trace correlation

// dd-trace init — must be first import
import tracer from "dd-trace";
tracer.init({
  service: "orders-service",
  env: process.env.DD_ENV,
  logInjection: true,  // injects trace_id and span_id into log lines
});

Adapt to Your Service

Replace orders-service in monitors.tf with your service name:

sed -i 's/orders-service/my-service/g' terraform/monitors.tf

Checklist

  • Unified Service Tagging applied: DD_ENV, DD_SERVICE, DD_VERSION on every pod
  • API key stored in Kubernetes Secret — not in Helm values or environment
  • logInjection: true in tracer init — enables log/trace correlation
  • Monitors notify correct PagerDuty/Slack channels (@pagerduty-*, @slack-*)
  • SLO target and timeframe match your error budget

See Also

  • references/datadog.md — Agent setup, APM, log management, monitors, dashboards, SLOs, synthetic tests, MCP server, pup CLI, Datadog Labs skills
  • references/llm-observability.md — LLMObs instrumentation, eval bootstrap, trace RCA, experiment analysis
  • /platform-skills:datadog — setup, instrument, monitor, dashboard, SLO, investigate incidents

examples

BEFORE_AFTER.md

CHANGELOG.md

CODE_OF_CONDUCT.md

COMMANDS.md

CONTRIBUTING.md

EDITOR_INTEGRATIONS.md

GETTING_STARTED.md

HOW_IT_WORKS.md

install.sh

INSTALLATION.md

LAUNCH.md

PROMPTS.md

QUICKSTART.md

README.md

renovate.json

SECURITY.md

SKILL.md

tessl.json

tile.json