Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).
93
97%
Does it follow best practices?
Impact
85%
7.08xAverage score across 4 eval scenarios
Passed
No known issues
Sampling is the practice of selectively collecting traces to reduce volume and cost while maintaining observability. This reference provides deep technical guidance on head sampling, tail sampling, and the critical architectural requirements for production systems.
Production trace volume at scale:
With 1% sampling:
Sampling reduces cost but introduces incomplete visibility:
Goal: Sample intelligently to minimize cost while maximizing signal.
Head sampling makes the sampling decision at the start of the trace (at the root span creation).
Request arrives → SDK generates trace_id → Hash(trace_id) % 100 < sampling_rate?
├─ YES → Trace is sampled (all spans are recorded)
└─ NO → Trace is NOT sampled (no spans are recorded)Key Point: The decision is made before any spans are created. If a trace is not sampled, no spans are sent.
✅ Simple: Easy to configure ✅ Efficient: No processing overhead (spans are never created) ✅ Deterministic: Same trace_id always produces the same decision ✅ Distributed: Works across microservices (via context propagation)
❌ Blind to content: Cannot sample based on span attributes (error status, latency) ❌ Statistical: May miss important traces (e.g., a rare error in an unsampled trace)
Always use ParentBased to ensure downstream services respect upstream sampling decisions.
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import ParentBasedTraceIdRatio
# Sample 10% of traces
sampler = ParentBasedTraceIdRatio(0.1)
provider = TracerProvider(sampler=sampler)import (
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
sampler := sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))
provider := sdktrace.NewTracerProvider(
sdktrace.WithSampler(sampler),
)import io.opentelemetry.sdk.trace.samplers.Sampler;
Sampler sampler = Sampler.parentBasedBuilder(Sampler.traceIdRatioBased(0.1)).build();
SdkTracerProvider provider = SdkTracerProvider.builder()
.setSampler(sampler)
.build();export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1 # 10%Without ParentBased:
Service A (sampled) → Span A recorded ✅
Service B (not sampled) → Span B NOT recorded ❌Result: Incomplete trace (broken trace graph)
With ParentBased:
Service A (sampled) → Span A recorded ✅
Service B (inherits sampling decision) → Span B recorded ✅Result: Complete trace
✅ Low-traffic systems (<1,000 RPS): Simple and effective ✅ Cost-constrained environments: Immediate volume reduction at SDK ✅ Distributed systems: ParentBased ensures trace completeness across services
Tail sampling makes the sampling decision after all spans in a trace are collected, allowing intelligent decisions based on span content.
All spans collected → Assemble complete trace → Apply policies → Keep or Drop?
├─ ERROR status? → Keep 100%
├─ Latency > 500ms? → Keep 100%
├─ Rare endpoint? → Keep 100%
└─ Normal request → Keep 1%Key Point: The collector buffers all spans for a trace, waits for the trace to complete, then decides.
✅ Intelligent: Keeps all errors, slow requests, and rare endpoints ✅ Cost-optimized: Drops boring traffic while retaining signals ✅ Flexible: Policy-based (error, latency, rate limiting)
❌ Complex: Requires stateful collector (sticky sessions) ❌ Resource-intensive: Must buffer spans in memory (high memory usage) ❌ Latency: Adds buffering delay (configurable, typically 10-30 seconds)
The tail_sampling processor supports multiple policies:
| Policy | Description | Use Case |
|---|---|---|
always_sample | Keep 100% | Debug mode |
status_code | Keep if span status = ERROR | Error tracking |
latency | Keep if duration > threshold | SLO monitoring |
rate_limiting | Keep N spans/second | Traffic limiting |
string_attribute | Keep if attribute matches regex | Feature flags, A/B tests |
probabilistic | Keep X% | Cost reduction for normal traffic |
composite | Combine multiple policies (OR logic) | Production systems |
⚠️ Intentional client-side HTTP cancellations may not set span status to ERROR. Per current semantic convention guidance, client spans for user-driven aborts can remain UNSET, so an error-only status_code policy will not retain them. If canceled requests matter operationally, pair status_code with latency or attribute-based policies.
processors:
tail_sampling:
# Wait time before making decision
decision_wait: 10s
# Max spans in memory
num_traces: 50000
# Expected new traces per second
expected_new_traces_per_sec: 1000
policies:
# Policy 1: Keep all errors
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
# Policy 2: Keep slow requests (>500ms)
- name: slow_requests
type: latency
latency:
threshold_ms: 500
# Policy 3: Keep specific endpoints
- name: checkout_endpoint
type: string_attribute
string_attribute:
key: http.route
values: ["/checkout", "/payment"]
enabled_regex_matching: false
# Policy 4: Sample 1% of normal traffic
- name: probabilistic
type: probabilistic
probabilistic:
sampling_percentage: 1
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp]Formula:
Memory (GB) = (num_traces × avg_spans_per_trace × avg_span_size) / 1,000,000,000Example:
num_traces: 50,000avg_spans_per_trace: 10avg_span_size: 1 KBMemory = (50,000 × 10 × 1,000) / 1,000,000,000 = 0.5 GBBest Practice: Set num_traces to handle decision_wait seconds of traffic:
num_traces = requests_per_second × decision_wait × spans_per_requestExample: 1,000 RPS × 10s × 10 spans = 100,000 traces
Goal: Reduce to N spans/day
Formula:
sampling_rate = target_spans_per_day / current_spans_per_dayExample:
sampling_rate = 86.4M / 8.64B = 0.01 = 1%Formula:
sampled_spans = total_spans × sampling_rateExample:
sampled_spans = 100,000 × 0.05 = 5,000 spans/secondWith head sampling, the probability that a multi-service trace is complete:
Formula:
P(complete) = sampling_rate ^ num_servicesExample (3 microservices, 10% sampling):
P(complete) = 0.1^3 = 0.001 = 0.1%Conclusion: Only 0.1% of traces will have all 3 services' spans.
Solution: Use ParentBased sampling to ensure 100% completion:
P(complete) = sampling_rate^1 = 0.1 = 10%Now 10% of traces are sampled, and all sampled traces are complete.
Critical: Tail sampling requires all spans of a trace to arrive at the same collector instance.
Span A (trace_id: 123) → Collector Pod 1 (buffers Span A, waits for Span B)
Span B (trace_id: 123) → Collector Pod 2 (buffers Span B, waits for Span A)
Decision Time:
- Pod 1: Incomplete trace (missing Span B) → Drops or makes wrong decision
- Pod 2: Incomplete trace (missing Span A) → Drops or makes wrong decisionResult: Data loss or incorrect sampling
Span A (trace_id: 123) → Collector Pod 1
Span B (trace_id: 123) → Collector Pod 1
Decision Time:
- Pod 1: Complete trace (has Span A + Span B) → Correct decisionUse a two-tier architecture:
Agent (with loadbalancing exporter) → Gateway (with tail_sampling processor)Agent Configuration:
exporters:
loadbalancing:
protocol:
otlp:
endpoint: placeholder
tls:
insecure: true
routing_key: "traceID" # Critical: ensures stickiness
resolver:
k8s:
service: otel-gateway-headless
ports:
- 4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loadbalancing]Gateway Configuration:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
limit_percentage: 80
spike_limit_percentage: 20
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow
type: latency
latency: {threshold_ms: 500}
- name: default
type: probabilistic
probabilistic: {sampling_percentage: 1}
batch:
timeout: 10s
send_batch_size: 1024
exporters:
otlp:
endpoint: backend.example.com:4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp]Headless Service:
apiVersion: v1
kind: Service
metadata:
name: otel-gateway-headless
spec:
clusterIP: None # Headless: returns pod IPs
selector:
app: otel-gateway
ports:
- name: otlp-grpc
port: 4317Test stickiness with trace correlation:
# Generate traces with same trace_id from different agents
# All spans should end up on the same gateway pod
kubectl logs -l app=otel-gateway | grep "trace_id: 123abc"All logs for trace_id: 123abc should come from the same pod.
Combine both for optimal cost/signal:
Net effect:
Configuration:
SDK (environment variables):
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1Gateway:
processors:
tail_sampling:
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow
type: latency
latency: {threshold_ms: 500}
- name: default
type: probabilistic
probabilistic: {sampling_percentage: 1}Limit to N spans/second per service:
processors:
tail_sampling:
policies:
- name: rate_limit
type: rate_limiting
rate_limiting:
spans_per_second: 100Keep traces with specific feature flags:
processors:
tail_sampling:
policies:
- name: beta_features
type: string_attribute
string_attribute:
key: feature_flag.key
values: ["beta_checkout", "new_ui"]
enabled_regex_matching: falseThe OpenTelemetry specification has an active proposal — open-telemetry/opentelemetry-specification#4854 — to introduce metric links: a concept analogous to span links, but for metrics.
What it would enable:
Status: 📋 Proposal / Discussion — not yet adopted into a stable OTel specification release. Do not implement against this feature in production until it reaches a stable spec.
Track progress: https://github.com/open-telemetry/opentelemetry-specification/issues/4854
✅ Use head sampling (ParentBasedTraceIdRatio) for simple, distributed sampling
✅ Use tail sampling for intelligent, policy-based sampling (errors, latency)
✅ Always use ParentBased to ensure trace completeness across services
✅ Use loadbalancing exporter with routing_key: traceID for tail sampling stickiness
✅ Deploy two-tier architecture (Agent with loadbalancing → Gateway with tail_sampling)
✅ Size tail_sampling memory: num_traces = RPS × decision_wait × spans_per_trace
✅ Monitor otelcol_processor_tail_sampling_policy_decision metrics
Sampling is not about throwing away data—it's about keeping the right data at the right cost.
docs
evals
cardinality-protection
claude-code-telemetry
collector-memory-limiter
scenario-1
scenario-2
scenario-3
scenario-4
tail-sampling-setup
references