CtrlK
BlogDocsLog inGet started
Tessl Logo

o11y-dev/opentelemetry-skill

Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).

93

7.08x
Quality

97%

Does it follow best practices?

Impact

85%

7.08x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

architecture.mdreferences/

OpenTelemetry Collector Architecture & Deployment Patterns

Overview

This reference provides comprehensive guidance on deploying the OpenTelemetry Collector in production environments, with a focus on Kubernetes architectures, scaling patterns, and the critical concept of load balancing stickiness for stateful operations like tail sampling.

Table of Contents

  1. Deployment Decision Matrix
  2. Agent Pattern (DaemonSet)
  3. Gateway Pattern (Deployment)
  4. Sidecar Pattern
  5. Hybrid Architecture
  6. Scaling Stateful Collectors
  7. Load Balancing & Sticky Sessions
  8. Target Allocator for Prometheus
  9. Resource Sizing Guidelines
  10. When NOT to Scale

Deployment Decision Matrix

Choose the right deployment architecture based on your requirements:

PatternUse CaseProsConsWhen to Use
Agent (DaemonSet)Host metrics, logs1 per node efficiencyNo central aggregationLogs, host metrics, K8s events
Gateway (Deployment)Tail sampling, aggregationCentral processing, scales independentlyAdditional network hopTail sampling, metric aggregation, fan-out
SidecarPer-pod isolationStrict isolation, no RBACHigh resource overheadServerless (Fargate), security isolation
HybridProduction systemsBest of bothIncreased complexityMost production deployments

Decision Tree

Do you need to collect logs or host metrics?
├─ YES → Deploy Agent (DaemonSet)
└─ NO  → Continue

Do you need tail sampling or span-to-metrics?
├─ YES → Deploy Gateway (Deployment) with sticky sessions
└─ NO  → Continue

Are you on serverless (Fargate/Lambda)?
├─ YES → Deploy Sidecar or use Lambda Extension Layer (see [platforms.md](platforms.md))
└─ NO  → Deploy Gateway for centralized export

Agent Pattern (DaemonSet)

Architecture

The Agent runs one collector pod per Kubernetes node. It collects:

  • Host metrics (CPU, memory, disk, network)
  • Logs from /var/log/pods
  • Kubernetes events
  • Application metrics via pod-local endpoints

Configuration

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  namespace: observability
spec:
  selector:
    matchLabels:
      app: otel-agent
  template:
    metadata:
      labels:
        app: otel-agent
    spec:
      serviceAccountName: otel-agent
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.147.0
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

RBAC Requirements

apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-agent
  namespace: observability
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: otel-agent
rules:
- apiGroups: [""]
  resources: ["pods", "nodes", "namespaces"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otel-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-agent
subjects:
- kind: ServiceAccount
  name: otel-agent
  namespace: observability

Best Practices

Always use DaemonSet for logs - It's the only way to access node-local log files ✅ Use tolerations - Ensure the agent runs on all nodes including control plane ✅ Resource limits - Set conservative limits (512Mi memory) to prevent node resource exhaustion ✅ Node-local processing - Do heavy processing (filtering, redaction) in the agent before forwarding


Gateway Pattern (Deployment)

Architecture

The Gateway is a stateless or stateful collector deployment that:

  • Aggregates data from multiple sources
  • Performs tail sampling (requires stickiness)
  • Reduces egress connections to backends
  • Provides a central point for authentication

Configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-gateway
  namespace: observability
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-gateway
  template:
    metadata:
      labels:
        app: otel-gateway
    spec:
      serviceAccountName: otel-gateway
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.147.0
        ports:
        - containerPort: 4317  # OTLP gRPC
          name: otlp-grpc
        - containerPort: 4318  # OTLP HTTP
          name: otlp-http
        - containerPort: 8888  # Metrics
          name: metrics
        - containerPort: 13133 # Health check
          name: health
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        livenessProbe:
          httpGet:
            path: /
            port: 13133
        readinessProbe:
          httpGet:
            path: /
            port: 13133
---
apiVersion: v1
kind: Service
metadata:
  name: otel-gateway
  namespace: observability
spec:
  selector:
    app: otel-gateway
  ports:
  - name: otlp-grpc
    port: 4317
    targetPort: 4317
  - name: otlp-http
    port: 4318
    targetPort: 4318
  - name: metrics
    port: 8888
    targetPort: 8888

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: otel-gateway-hpa
  namespace: observability
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: otel-gateway
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Best Practices

3 replicas minimum - Ensures high availability during rolling updates ✅ Resource over-provisioning - Gateway handles bursts; set limits 2-4x requests ✅ HPA on memory - Collector memory usage is often the bottleneck ✅ Health checks - Always configure liveness and readiness probes


Sidecar Pattern

Architecture

Each application pod gets its own collector container. Use only for:

  • Serverless environments (AWS Fargate, Google Cloud Run)
  • Strict security isolation requirements
  • Per-tenant data segregation

Note: For AWS Lambda, use the Collector Extension Layer pattern instead. See platforms.md for Lambda-specific configuration.

Configuration

apiVersion: v1
kind: Pod
metadata:
  name: app-with-sidecar
spec:
  containers:
  - name: app
    image: myapp:latest
    env:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: "http://localhost:4317"
  - name: otel-sidecar
    image: otel/opentelemetry-collector-contrib:0.147.0
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi

Trade-offs

High resource overhead - Every pod pays the memory/CPU cost ❌ Deployment complexity - Must update all pods to change collector config ❌ No RBAC - Cannot use k8sattributes processor (no cluster access)

Strong isolation - Tenant A's collector never sees Tenant B's data ✅ Fargate compatible - No DaemonSet support in serverless


Hybrid Architecture

Pattern

Agent (DaemonSet) → Gateway (Deployment) → Backend

This is the recommended production architecture:

  1. Agent: Collects logs, host metrics, and forwards traces to Gateway
  2. Gateway: Performs tail sampling, aggregation, and exports to backend

Configuration Flow

# Agent config snippet
exporters:
  otlp:
    endpoint: otel-gateway.observability.svc.cluster.local:4317
    tls:
      insecure: true  # Within cluster

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]  # Forward to gateway
    logs:
      receivers: [filelog]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [otlp]  # Forward to gateway
# Gateway config snippet
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    limit_percentage: 80
    spike_limit_percentage: 20
  tail_sampling:
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow
        type: latency
        latency: {threshold_ms: 500}
      - name: default
        type: probabilistic
        probabilistic: {sampling_percentage: 1}

exporters:
  otlp:
    endpoint: backend.example.com:4317

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp]

Scaling Stateful Collectors

The Stickiness Problem

Tail sampling requires all spans of a trace to arrive at the same collector instance to make correct sampling decisions. Standard Kubernetes Service load balancing (round-robin) breaks this requirement.

Without Stickiness (BROKEN)

Span A (trace_id: 123) → Gateway Pod 1
Span B (trace_id: 123) → Gateway Pod 2  ❌ Different pod!

Result: Incomplete trace → Incorrect sampling decision

With Stickiness (CORRECT)

Span A (trace_id: 123) → Gateway Pod 1
Span B (trace_id: 123) → Gateway Pod 1  ✅ Same pod!

Result: Complete trace → Correct sampling decision


Load Balancing & Sticky Sessions

Solution: Load Balancing Exporter

Use a two-tier architecture:

  1. Pre-Gateway (Agents): Use loadbalancing exporter with traceID routing
  2. Gateway: Runs tail_sampling processor

Best Practice: Configure the loadbalancing exporter with deterministic, low-cardinality string routing keys such as traceID, tenant_id, or cluster. Avoid non-string or high-cardinality volatile attributes (e.g., timestamps, session IDs) to preserve stickiness for stateful pipelines. Always normalize attributes to strings before routing to prevent shard churn and ensure consistent load distribution.

Architecture

Agent → LoadBalancing Exporter (routing_key: traceID) → Gateway (Headless Service)

Pre-Gateway Configuration

# In the Agent or Pre-Gateway tier
exporters:
  loadbalancing:
    protocol:
      otlp:
        endpoint: placeholder  # Will be replaced by resolver
        tls:
          insecure: true
    routing_key: "traceID"  # Critical: Ensures stickiness
    resolver:
      k8s:
        service: otel-gateway-headless  # Headless Service
        ports:
          - 4317

Gateway Headless Service

apiVersion: v1
kind: Service
metadata:
  name: otel-gateway-headless
  namespace: observability
spec:
  clusterIP: None  # Headless: Resolver gets individual pod IPs
  selector:
    app: otel-gateway
  ports:
  - name: otlp-grpc
    port: 4317
    targetPort: 4317

How It Works

  1. Resolver discovers pods: The k8s resolver queries the Headless Service and gets individual pod IPs (e.g., 10.0.1.5, 10.0.1.6, 10.0.1.7)
  2. Consistent hashing: The loadbalancing exporter uses traceID as the routing key and applies consistent hashing to select the target pod
  3. Sticky routing: All spans with the same traceID hash to the same pod IP

Critical Requirements

⚠️ Must use Headless Service - Regular Service returns VIP, not individual IPs ⚠️ Must set routing_key: traceID - Without this, routing is random ⚠️ Gateway must have stable network identity - Use StatefulSet if pod IPs change frequently


Target Allocator for Prometheus

The Problem

When scaling the Prometheus receiver, every collector replica scrapes every target → duplicate data.

The Solution

The Target Allocator (part of the OpenTelemetry Operator) shards Prometheus targets across collector replicas.

Architecture

Target Allocator (Discovers targets) → Assigns targets to Collector Replicas
Collector Replica 1: Scrapes targets 1-100
Collector Replica 2: Scrapes targets 101-200
Collector Replica 3: Scrapes targets 201-300

Configuration

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-prometheus
spec:
  mode: statefulset
  replicas: 3
  targetAllocator:
    enabled: true
    serviceAccount: otel-prometheus-ta
    prometheusCR:
      enabled: true  # Discover ServiceMonitors and PodMonitors
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs: []  # Populated by Target Allocator
        target_allocator:
          endpoint: http://otel-prometheus-targetallocator:80
          interval: 30s
          collector_id: "${POD_NAME}"

Benefits

No duplicate metrics - Each target is scraped by exactly one collector ✅ Horizontal scaling - Add replicas to distribute scrape load ✅ Auto-discovery - Integrates with Prometheus Operator CRDs


Resource Sizing Guidelines

Agent (DaemonSet)

WorkloadCPU RequestMemory RequestCPU LimitMemory Limit
Low (dev)100m128Mi200m256Mi
Medium200m256Mi500m512Mi
High (prod)500m512Mi1000m1Gi

Gateway (Deployment)

ThroughputCPU RequestMemory RequestCPU LimitMemory LimitReplicas
<1k RPS500m1Gi1000m2Gi2
1-10k RPS1000m2Gi2000m4Gi3
10-50k RPS2000m4Gi4000m8Gi5
>50k RPS4000m8Gi8000m16Gi10+

Persistent Storage (file_storage)

Use CaseVolume SizeIOPS Requirement
Short buffer (<1h)5-10 GBStandard
Medium buffer (1-6h)20-50 GBgp3 (3000 IOPS)
Long buffer (>6h)100+ GBgp3 (5000+ IOPS)

When NOT to Scale

Scaling the collector adds replicas and resources, but it does not fix every performance problem. Before scaling, diagnose the actual bottleneck.

The Key Signal: Queue Saturation

otelcol_exporter_queue_size / otelcol_exporter_queue_capacity
Queue RatioInterpretationAction
< 0.5Healthy — pipeline has headroomNo action needed
0.5–0.8Warning — backend is slowing downInvestigate backend latency
> 0.8 persistently⚠️ Downstream bottleneckScaling the collector won't help

If otelcol_exporter_queue_size is persistently near otelcol_exporter_queue_capacity, the problem is downstream (the backend, not the collector). Adding more collector replicas only fills the backend queue faster.

Root Causes That Scaling Cannot Fix

SymptomLikely Root CauseCorrect Fix
Queue persistently > 80%Backend saturated or slowScale the backend, reduce ingestion rate, or enable sampling
High otelcol_exporter_send_failed_spansBackend down or misconfiguredFix the backend; enable persistent queues (file_storage) as a buffer
Network egress throttledCloud network limitsIncrease network quota; compress data (snappy/gzip); use Kafka as buffer
Memory OOM despite adding replicasData cardinality explosionApply cardinality limits; use filter processor to drop high-volume data
High CPU on all replicasOTTL transform too complexSimplify transforms; profile with pprof

Decision Tree: Scale vs Fix

Is otelcol_exporter_queue_size / queue_capacity > 0.8?
├─ YES → Check backend: Is the backend CPU/disk/network saturated?
│         ├─ YES → Scale the BACKEND, not the collector
│         └─ NO  → Check export failures: Are there auth/TLS errors?
│                   ├─ YES → Fix configuration, not scale
│                   └─ NO  → Check network throughput limits
└─ NO  → Is collector CPU > 80%?
          ├─ YES → Scale the COLLECTOR (CPU-bound processing)
          └─ NO  → Is collector memory > 80% of limit?
                    ├─ YES → Increase memory limit or reduce queue_size
                    └─ NO  → Pipeline is healthy

Scaling Anti-Patterns

Scaling collectors when backend is the bottleneck — more replicas = more pressure on an already saturated backend

Scaling without sampling — at >50k RPS, you should evaluate tail sampling or head sampling before adding replicas

Scaling stateful collectors without sticky routing — more tail_sampling replicas without loadbalancing exporter splits traces across instances, breaking sampling correctness

Scaling to compensate for cardinality explosion — unbounded attributes cause exponential storage growth regardless of replica count

When Scaling DOES Help

✅ Collector CPU is the bottleneck (transform/filter heavy workloads) ✅ Receiver throughput is limited (single node accept queue full) ✅ Tail sampling replicas need more memory for in-flight traces ✅ Prometheus scraping needs more targets per collector


Reference Links


Summary

✅ Use DaemonSet for logs and host metrics (always) ✅ Use Gateway for tail sampling with loadbalancing exporter (sticky sessions) ✅ Use Sidecar only for serverless or strict isolation ✅ Use Hybrid for production (Agent → Gateway → Backend) ✅ Use Target Allocator for Prometheus scraping at scale ✅ Always configure HPA on memory utilization for gateways ✅ Always use Headless Service with loadbalancing exporter ✅ Before scaling, check if the problem is downstream (backend saturation, not collector) ✅ Use otel/opentelemetry-collector-contrib:0.147.0 for current stable image

Deployment is not just about running a binary—it's about building a resilient, scalable observability pipeline.

CHANGELOG.md

CONTRIBUTING.md

README.md

SKILL.md

tessl.json

tile.json