Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).
93
97%
Does it follow best practices?
Impact
85%
7.08xAverage score across 4 eval scenarios
Passed
No known issues
OpenTelemetry is available for a variety of platforms and environments, ensuring seamless observability in hybrid systems. This reference provides comprehensive guidance on deploying OpenTelemetry in different platform environments, with detailed focus on serverless functions (AWS Lambda, Azure Functions, GCP Cloud Functions) and client-side applications.
OpenTelemetry supports the following major platform categories:
| Platform | Description | Key Considerations |
|---|---|---|
| Functions as a Service (FaaS) | AWS Lambda, Azure Functions, GCP Cloud Functions | Cold starts, execution timeouts, async export |
| Client-Side Apps | Mobile (iOS, Android), Browser (JavaScript), Desktop | Data volume control, PII handling, battery life |
| Container Platforms | Kubernetes, Docker, ECS/Fargate | See architecture.md for details |
| Virtual Machines | EC2, Azure VMs, GCE | Traditional collector deployments |
Functions as a Service (FaaS) is an important serverless compute platform for cloud-native applications. However, platform quirks usually mean these applications have slightly different monitoring guidance and requirements than applications running on Kubernetes or Virtual Machines.
Serverless functions have unique characteristics that affect observability:
| Challenge | Impact | OpenTelemetry Solution |
|---|---|---|
| Short Execution Time | Function may terminate before telemetry export completes | Use Collector Extension Layer (async export) |
| Cold Starts | Initialization overhead increases latency | Minimize instrumentation scope, use provisioned concurrency |
| Limited Control | Cannot run DaemonSet or Gateway collectors | Use Lambda Layers or Collector Extensions |
| Timeouts | Function must complete within time limit | Never block on telemetry export |
| Cost Per Millisecond | Every millisecond of execution is billed | Async export to avoid blocking handler |
Pattern 1: Lambda Layer (Auto-Instrumentation)
Pattern 2: Collector Extension Layer
Pattern 3: Manual SDK Integration
Problem: Lambda may terminate the execution environment immediately after your handler returns, killing any in-flight telemetry exports.
Solution: Use the OpenTelemetry Collector Extension Layer to decouple export from execution.
┌─────────────────────────────────────┐
│ Lambda Execution │
│ │
│ ┌──────────────┐ ┌─────────────┐ │
│ │ Handler │ │ Collector │ │
│ │ Function │→ │ Extension │ │
│ └──────────────┘ └─────────────┘ │
│ ↓ ↓ │
│ Returns Exports │
│ Immediately Async │
└─────────────────────────────────────┘Benefits: ✅ Handler returns immediately (no blocking on export) ✅ Telemetry exported asynchronously after handler completes ✅ Collector persists across invocations (reduces overhead) ✅ Data flushed during next invocation if Lambda freezes
For Node.js:
# ARN format (region-specific)
arn:aws:lambda:<region>:901920570463:layer:aws-otel-nodejs-<arch>-ver-1-18-1:4
# Example: us-east-1, x86_64
arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-nodejs-amd64-ver-1-18-1:4For Python:
arn:aws:lambda:<region>:901920570463:layer:aws-otel-python-<arch>-ver-1-25-0:3For Java:
arn:aws:lambda:<region>:901920570463:layer:aws-otel-java-wrapper-<arch>-ver-1-32-0:3Collector Extension Layer (required for async export):
arn:aws:lambda:<region>:901920570463:layer:aws-otel-collector-<arch>-ver-0-102-1:1Essential Variables:
# Service identification
OTEL_SERVICE_NAME=my-lambda-function
# Export configuration (HTTP recommended for Lambda)
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend:4318
# Resource attributes
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production,service.version=1.0.0
# Enable auto-instrumentation
AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler # For Node.js/Python⚠️ Go SDK caveat:
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufis not enough by itself to switch the Go SDK from OTLP/gRPC to OTLP/HTTP. For Go Lambda functions, instantiateotlptracehttp/otlpmetrichttpin code or usego.opentelemetry.io/contrib/exporters/autoexportif you need env-driven protocol selection.
Performance Optimization Variables:
# Minimize cold start overhead - disable unnecessary instrumentations
OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED=false
OTEL_INSTRUMENTATION_AWS_SDK_ENABLED=true
OTEL_INSTRUMENTATION_HTTP_ENABLED=true
# Export only traces (disable metrics/logs if not needed)
OTEL_METRICS_EXPORTER=none
OTEL_LOGS_EXPORTER=none
# Disable batch processor timeout (use extension's batching)
OTEL_BSP_SCHEDULE_DELAY=0
OTEL_BSP_MAX_QUEUE_SIZE=2048Critical Attributes for Non-Blocking Behavior:
# Force synchronous export to collector extension (not backend)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 # Collector extension endpoint
# Collector extension handles async export to backend
# Set backend endpoint in collector extension config (see below)Create collector.yaml in Lambda function package:
# /opt/collector-config/config.yaml
receivers:
otlp:
protocols:
http:
endpoint: localhost:4318
processors:
batch:
timeout: 1s # Short timeout for Lambda's lifecycle
send_batch_size: 512
memory_limiter:
limit_mib: 50 # Conservative for Lambda extension
spike_limit_mib: 10
exporters:
otlp:
endpoint: your-backend.example.com:4317
tls:
insecure: false
timeout: 5s # Must complete before Lambda freezes
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]Set environment variable to use custom config:
OPENTELEMETRY_COLLECTOR_CONFIG_FILE=/var/task/collector.yamlSymptom: Traces are missing for slow or timed-out functions.
Solution 1: Increase Function Timeout
# serverless.yml or SAM template
Timeout: 30 # Ensure sufficient time for exportSolution 2: Configure Export Timeout
# Reduce export timeout to avoid blocking
OTEL_EXPORTER_OTLP_TIMEOUT=1000 # 1 second maxSolution 3: Use Collector Extension with Buffering
Always include these resource attributes for Lambda functions:
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
# Automatic via environment
resource = Resource.create({
"cloud.provider": "aws",
"cloud.platform": "aws_lambda",
"cloud.region": os.environ.get("AWS_REGION"),
"faas.name": os.environ.get("AWS_LAMBDA_FUNCTION_NAME"),
"faas.version": os.environ.get("AWS_LAMBDA_FUNCTION_VERSION"),
"faas.instance": os.environ.get("AWS_LAMBDA_LOG_STREAM_NAME"),
})Span Attributes (set in handler):
import json
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def lambda_handler(event, context):
with tracer.start_as_current_span("lambda_handler") as span:
# FaaS semantic conventions
span.set_attribute("faas.trigger", "http") # or "pubsub", "timer", etc.
span.set_attribute("faas.execution", context.aws_request_id)
span.set_attribute("faas.invocation_id", context.request_id)
# Add cold start indicator
span.set_attribute("faas.coldstart", is_cold_start())
# Business context
span.set_attribute("user.id", event.get("userId"))
# Process request
result = process_request(event)
# Set status
span.set_attribute("faas.result", "success")
return result❌ Blocking on export in handler:
# DON'T DO THIS
def lambda_handler(event, context):
span = tracer.start_span("handler")
result = process(event)
span.end()
tracer.force_flush() # ❌ Blocks until export completes!
return result✅ Use collector extension (non-blocking):
# DO THIS
def lambda_handler(event, context):
with tracer.start_as_current_span("handler") as span:
result = process(event)
# Handler returns immediately
# Collector extension exports async
return result❌ Using gRPC protocol (high cold start overhead):
OTEL_EXPORTER_OTLP_PROTOCOL=grpc # ❌ Slow connection establishment✅ Use HTTP/protobuf:
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf # ✅ Faster, more reliableFor Go functions, treat that env var as a configuration hint, not an exporter selector. The exporter package still determines the protocol unless you adopt the Go autoexport helper.
❌ Over-instrumenting (increases cold start):
# All instrumentations enabled by default
# ❌ Instruments libraries you don't use✅ Selective instrumentation:
OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED=false
OTEL_INSTRUMENTATION_AWS_SDK_ENABLED=true # Only what you needCritical Metrics to Monitor:
# Cold start rate
sum(rate(faas_coldstart_total{result="true"}[5m])) by (faas_name)
# Initialization overhead
histogram_quantile(0.95,
sum(rate(faas_init_duration_milliseconds_bucket[5m])) by (le, faas_name)
)
# Execution duration (include telemetry overhead)
histogram_quantile(0.95,
sum(rate(faas_execution_duration_milliseconds_bucket[5m])) by (le, faas_name)
)
# Timeout rate
sum(rate(faas_timeout_total[5m])) by (faas_name)// handler.js - No instrumentation code needed with auto-instrumentation layer
exports.handler = async (event, context) => {
const { trace } = require('@opentelemetry/api');
const span = trace.getActiveSpan();
// Add custom attributes
if (span) {
span.setAttribute('faas.trigger', 'http');
span.setAttribute('faas.coldstart', process.env._COLD_START === 'true');
span.setAttribute('user.id', event.userId);
}
// Mark cold start
if (!global._COLD_START) {
global._COLD_START = true;
process.env._COLD_START = 'true';
} else {
process.env._COLD_START = 'false';
}
// Business logic
const result = await processRequest(event);
// Handler returns immediately
// Collector extension handles export async
return {
statusCode: 200,
body: JSON.stringify(result)
};
};Deployment Configuration (serverless.yml):
service: my-service
provider:
name: aws
runtime: nodejs18.x
architecture: x86_64
timeout: 30
memorySize: 512
environment:
OTEL_SERVICE_NAME: ${self:service}-${self:provider.stage}
OTEL_EXPORTER_OTLP_PROTOCOL: http/protobuf
OTEL_EXPORTER_OTLP_ENDPOINT: http://localhost:4318
OTEL_RESOURCE_ATTRIBUTES: deployment.environment=${self:provider.stage}
AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-handler
OTEL_INSTRUMENTATION_COMMON_DEFAULT_ENABLED: false
OTEL_INSTRUMENTATION_AWS_SDK_ENABLED: true
OTEL_INSTRUMENTATION_HTTP_ENABLED: true
OTEL_METRICS_EXPORTER: none
OTEL_LOGS_EXPORTER: none
functions:
myFunction:
handler: handler.handler
layers:
- arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-nodejs-amd64-ver-1-18-1:4
- arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-collector-amd64-ver-0-102-1:1Azure Functions supports OpenTelemetry through:
Using Application Insights with OTLP:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"maxTelemetryItemsPerSecond": 20
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[4.*, 5.0.0)"
}
}Environment Variables:
APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=...
OTEL_SERVICE_NAME=my-azure-function
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend:4318✅ Use Application Insights native integration for simplicity
✅ Configure sampling to control cost (default: 5 requests/second)
✅ Use Consumption Plan for variable workloads, Premium for consistent latency
✅ Set FUNCTIONS_WORKER_RUNTIME appropriately (node, python, dotnet, java)
GCP Cloud Functions (2nd generation) runs on Cloud Run, supporting:
Automatic Instrumentation (Node.js):
# package.json
{
"scripts": {
"start": "functions-framework --target=myFunction"
},
"dependencies": {
"@google-cloud/functions-framework": "^3.0.0",
"@opentelemetry/api": "^1.4.0"
}
}Environment Variables:
OTEL_SERVICE_NAME=my-gcp-function
OTEL_TRACES_EXPORTER=google_cloud_trace
GOOGLE_CLOUD_PROJECT=my-project-idExport to Custom Backend:
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-backend:4317
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://your-backend:4317/v1/traces✅ Use 2nd generation Cloud Functions (runs on Cloud Run, better performance)
✅ Export to Cloud Trace for native GCP integration
✅ Configure minimum instances to avoid cold starts
✅ Use --timeout=60s and --max-instances=100 flags appropriately
The processor.resourcedetection.removeGCPFaasID feature gate has been permanently removed in collector-contrib (#45808). This gate was introduced to let users opt back into the legacy faas.id attribute; its removal makes the change permanent:
faas.id is no longer populated by the resourcedetectionprocessor GCP detector.faas.instance (the Cloud Run revision/instance ID) as the canonical FaaS instance identifier.processor.resourcedetection.removeGCPFaasID=false as a feature gate override in your collector deployment flags, remove that flag — it is no longer recognized and may cause collector startup errors.Migration: Replace any references to faas.id in dashboards, alerts, and OTTL transforms with faas.instance:
# OTTL migration example — rename legacy faas.id to faas.instance
processors:
transform:
resource_statements:
- set(attributes["faas.instance"], attributes["faas.id"]) where attributes["faas.id"] != nil
- delete_key(attributes, "faas.id") where attributes["faas.id"] != nilClient-side applications (mobile apps, web browsers, desktop apps) have unique constraints that require careful consideration when implementing OpenTelemetry. Unlike server-side applications, client-side apps run on user-controlled devices with limited resources, unreliable networks, and strict privacy requirements.
| Challenge | Impact | OpenTelemetry Solution |
|---|---|---|
| Battery Drain | Excessive telemetry collection drains device battery | Aggressive sampling (1-5%), batch exports (5-10s intervals) |
| Data Usage | Users on metered connections pay for telemetry data | Compress payloads, sample aggressively, avoid verbose attributes |
| Bundle Size | Large SDK increases app download size | Use tree-shaking, import only needed modules, consider vendor SDKs |
| Privacy | User data must be protected (GDPR, CCPA) | Never collect PII, mask sensitive DOM elements, use consent management |
| Network Reliability | Offline or flaky connections cause data loss | Buffer telemetry locally, retry with exponential backoff |
| Performance | Instrumentation must not impact UX | Async instrumentation, <1% CPU overhead, avoid blocking main thread |
1. Aggressive Sampling (1-5%)
Mobile apps generate high volumes of user interactions. Sampling is mandatory to control data volume and battery usage.
iOS (Swift):
import OpenTelemetryApi
import OpenTelemetrySdk
// Configure with 1% sampling for production
let resource = Resource(attributes: [
"service.name": "my-ios-app",
"service.version": "1.2.3",
"deployment.environment": "production",
"device.model.name": UIDevice.current.model,
"device.os.version": UIDevice.current.systemVersion
])
// ParentBased ensures trace continuity across services
let sampler = ParentBasedSampler(root: TraceIdRatioBasedSampler(ratio: 0.01)) // 1% sampling
let tracerProvider = TracerProviderBuilder()
.with(resource: resource)
.with(sampler: sampler)
.build()
OpenTelemetry.registerTracerProvider(tracerProvider: tracerProvider)Android (Kotlin):
For Android-only JVM apps, the Java SDK remains the most established production path. OpenTelemetry's native Kotlin Multiplatform SDK is now in active development for Android, JVM, iOS, and JS targets, with Android/JVM the most battle-tested so far. Prefer that track when you explicitly need shared Android/iOS/JS instrumentation, but treat its logging and tracing APIs as emerging until you have validated target-platform support and API maturity for your deployment.
import io.opentelemetry.api.GlobalOpenTelemetry
import io.opentelemetry.sdk.OpenTelemetrySdk
import io.opentelemetry.sdk.resources.Resource
import io.opentelemetry.sdk.trace.SdkTracerProvider
import io.opentelemetry.sdk.trace.samplers.Sampler
val resource = Resource.create(
Attributes.of(
ResourceAttributes.SERVICE_NAME, "my-android-app",
ResourceAttributes.SERVICE_VERSION, "1.2.3",
ResourceAttributes.DEPLOYMENT_ENVIRONMENT, "production",
ResourceAttributes.DEVICE_MODEL_NAME, Build.MODEL,
ResourceAttributes.OS_VERSION, Build.VERSION.RELEASE
)
)
// ParentBased sampler for trace continuity
val sampler = Sampler.parentBased(Sampler.traceIdRatioBased(0.01)) // 1% sampling
val tracerProvider = SdkTracerProvider.builder()
.setResource(resource)
.setSampler(sampler)
.build()
val openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.build()
GlobalOpenTelemetry.set(openTelemetry)2. Battery-Aware Export Configuration
Configure batch exports to minimize wake-ups and network calls:
iOS:
import OpenTelemetrySdk
let batchProcessor = BatchSpanProcessor(
spanExporter: OtlpHttpTraceExporter(endpoint: exporterEndpoint),
scheduleDelay: 10.0, // Export every 10 seconds
maxQueueSize: 512, // Smaller queue for mobile
maxExportBatchSize: 256 // Smaller batches
)
tracerProvider.addSpanProcessor(batchProcessor)Android:
val batchProcessor = BatchSpanProcessor.builder(
OtlpHttpSpanExporter.builder()
.setEndpoint("https://your-backend.example.com/v1/traces")
.build()
)
.setScheduleDelay(10, TimeUnit.SECONDS) // Export every 10 seconds
.setMaxQueueSize(512)
.setMaxExportBatchSize(256)
.build()
tracerProvider.addSpanProcessor(batchProcessor)3. Network Reliability & Offline Support
Handle offline scenarios gracefully by buffering telemetry:
iOS (using local storage):
import Foundation
// Pseudocode template - requires implementation for your specific needs
class OfflineStorage {
static let queue = DispatchQueue(label: "telemetry.offline")
static let maxStoredSpans = 1000
static func saveSpan(_ span: ReadableSpan) {
queue.async {
// Example: Serialize span to JSON and save to UserDefaults
// let data = try? JSONEncoder().encode(span)
// UserDefaults.standard.set(data, forKey: "span_\(UUID().uuidString)")
// Implement LRU eviction if count > maxStoredSpans
}
}
static func flushOfflineSpans() {
queue.async {
// Example: Retrieve stored spans and export when online
// let spans = retrieveStoredSpans()
// exporter.export(spans) { result in ... }
// Clear storage after successful export
}
}
}Android (using Room database):
// Pseudocode template - requires Room database setup
@Dao
interface TelemetryDao {
@Insert
fun insertSpan(span: SpanEntity)
@Query("SELECT * FROM spans LIMIT 100")
fun getSpans(): List<SpanEntity>
@Delete
fun deleteSpans(spans: List<SpanEntity>)
}
// Export when network available
if (isNetworkAvailable()) {
val spanEntities = database.telemetryDao().getSpans()
// Convert SpanEntity to proper SDK format before export
// val spans = spanEntities.map { it.toReadableSpan() }
// exporter.export(spans)
database.telemetryDao().deleteSpans(spanEntities)
}4. Privacy & PII Handling
Never collect PII without explicit user consent:
iOS:
// ❌ DON'T: Collect PII
span.setAttribute("user.email", user.email)
span.setAttribute("user.phone", user.phone)
// ✅ DO: Use anonymous IDs
span.setAttribute("user.id", user.anonymousId) // Hashed or UUID
span.setAttribute("user.segment", user.segment) // "premium", "free"Android:
// ❌ DON'T: Collect sensitive data
span.setAttribute("credit_card", cardNumber)
// ✅ DO: Use non-identifying attributes
span.setAttribute("payment.method", "credit_card")
span.setAttribute("payment.provider", "stripe")5. Monitor SDK Overhead
Track OpenTelemetry's impact on app performance:
// iOS: Monitor CPU and memory usage
func reportMemoryUsage() -> UInt64 {
var info = mach_task_basic_info()
var count = mach_msg_type_number_t(MemoryLayout<mach_task_basic_info>.size)/4
let kerr: kern_return_t = withUnsafeMutablePointer(to: &info) {
$0.withMemoryRebound(to: integer_t.self, capacity: 1) {
task_info(mach_task_self_, task_flavor_t(MACH_TASK_BASIC_INFO), $0, &count)
}
}
return kerr == KERN_SUCCESS ? info.resident_size : 0
}
let startCPU = ProcessInfo.processInfo.processorTime
let startMemory = reportMemoryUsage()
// Initialize OpenTelemetry
setupOpenTelemetry()
let cpuOverhead = ProcessInfo.processInfo.processorTime - startCPU
let memoryOverhead = reportMemoryUsage() - startMemory
// Should be <1% CPU, <5MB memory1. Minimize Bundle Size
Use tree-shaking and import only necessary modules:
// ❌ DON'T: Import entire SDK
import * as opentelemetry from '@opentelemetry/api';
// ✅ DO: Import specific modules
import { trace } from '@opentelemetry/api';
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { ZoneContextManager } from '@opentelemetry/context-zone';
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';2. CORS Configuration
Backend must support CORS for browser-based OTLP export:
Backend CORS Headers Required:
Access-Control-Allow-Origin: https://your-app.com
Access-Control-Allow-Methods: POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, X-API-Key
Access-Control-Max-Age: 86400Exporter Configuration:
const exporter = new OTLPTraceExporter({
url: 'https://your-backend.example.com/v1/traces',
headers: {
'X-API-Key': 'your-api-key',
'Content-Type': 'application/json'
},
// HTTP not gRPC (browsers don't support gRPC-Web by default)
});3. User Privacy & DOM Masking
Mask sensitive DOM elements to prevent PII leakage:
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';
const userInteractionInstrumentation = new UserInteractionInstrumentation({
// Don't capture text from password, credit card fields
eventNames: ['click', 'submit'],
shouldPreventSpanCreation: (eventType, element, span) => {
// Skip instrumentation for sensitive forms
if (element.type === 'password' || element.name === 'creditCard') {
return true;
}
return false;
}
});4. Complete Browser Setup
import { trace } from '@opentelemetry/api';
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { ZoneContextManager } from '@opentelemetry/context-zone';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
// Resource attributes for browser
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-web-app',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: 'production'
});
// HTTP exporter (not gRPC)
const exporter = new OTLPTraceExporter({
url: 'https://your-backend.example.com/v1/traces',
headers: { 'X-API-Key': 'your-api-key' }
});
const provider = new WebTracerProvider({
resource: resource,
// 5% sampling for browser traffic
sampler: new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.05)
})
});
// Batch exports every 5 seconds
provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
maxQueueSize: 100,
maxExportBatchSize: 50,
scheduledDelayMillis: 5000 // 5 seconds
}));
// Zone context manager for async operations
provider.register({
contextManager: new ZoneContextManager()
});5. Handle Offline Scenarios
Buffer telemetry when offline using localStorage:
// Note: This is a simplified example for illustration
// In production, use a SpanProcessor to buffer spans properly
const offlineStorage = {
maxSpans: 500,
saveSpanData(spanData) {
const spans = JSON.parse(localStorage.getItem('otel_spans') || '[]');
spans.push(spanData);
// LRU eviction
if (spans.length > this.maxSpans) {
spans.shift();
}
localStorage.setItem('otel_spans', JSON.stringify(spans));
},
// This is pseudocode - actual implementation requires proper span reconstruction
async flushOfflineSpans() {
const spanData = JSON.parse(localStorage.getItem('otel_spans') || '[]');
if (spanData.length > 0 && navigator.onLine) {
// In practice, use a custom SpanProcessor to handle offline scenarios
// The exporter expects proper ReadableSpan objects
localStorage.removeItem('otel_spans');
}
}
};
// Flush when back online
window.addEventListener('online', () => {
offlineStorage.flushOfflineSpans();
});6. Performance Monitoring
Track SDK overhead to ensure <1% performance impact:
const startTime = performance.now();
// Note: performance.memory is Chrome-specific and not available in Firefox/Safari
const startMemory = performance.memory?.usedJSHeapSize || 0;
// Initialize OpenTelemetry
initializeOpenTelemetry();
const initTime = performance.now() - startTime;
const memoryUsage = (performance.memory?.usedJSHeapSize || 0) - startMemory;
console.log(`OTel init time: ${initTime}ms`);
console.log(`OTel memory: ${memoryUsage / 1024 / 1024}MB (Chrome only)`);
// Should be <100ms init, <5MB memory❌ Sampling at 100%: Results in excessive battery drain and data usage
// DON'T
sampler: new AlwaysOnSampler()✅ Use 1-5% sampling:
// DO
sampler: new TraceIdRatioBasedSampler(0.01)❌ Collecting PII: Violates GDPR, CCPA
// DON'T
span.setAttribute('user.email', email);✅ Use anonymous IDs:
// DO
span.setAttribute('user.id', anonymousId);❌ Synchronous exports: Blocks main thread
// DON'T
exporter.export(spans); // Blocks UI✅ Batch asynchronously:
// DO
provider.addSpanProcessor(new BatchSpanProcessor(exporter));❌ Large bundle size: Increases page load time
// DON'T: Import everything
import '@opentelemetry/auto-instrumentations-web';✅ Selective imports:
// DO: Only import what you need
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';faas.coldstart attributecloud.platform, faas.name, faas.triggerdevice.model.name, os.version, service.version for debuggingOpenTelemetry and Prometheus are actively working to stabilize the Prometheus → OTLP Native Histogram conversion path. The upstream specification issue open-telemetry/opentelemetry-specification#4748 tracks the stabilization of Prometheus Native Histograms in the OTLP conversion layer.
What this means:
Current status: Stabilization is in progress upstream — check the spec issue for the latest state before relying on end-to-end native histogram fidelity in production.
Track progress: https://github.com/open-telemetry/opentelemetry-specification/issues/4748
The Prometheus exporter specification is now mixed stability, and the mapping of OTLP Resource attributes into Prometheus target_info remains under active upstream work (spec#4927, spec#4926).
What this means in practice:
target_info carrying all routing metadata.transform processor) instead of depending on implicit exporter conversion.Platform-specific OpenTelemetry deployments require careful consideration of:
Always prioritize non-blocking telemetry export in serverless environments to avoid timeouts and unexpected costs.
For more information on deployment architectures, see architecture.md. For instrumentation patterns, see instrumentation.md.
docs
evals
cardinality-protection
claude-code-telemetry
collector-memory-limiter
scenario-1
scenario-2
scenario-3
scenario-4
tail-sampling-setup
references