Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).
93
97%
Does it follow best practices?
Impact
85%
7.08xAverage score across 4 eval scenarios
Passed
No known issues
This reference is a routing-friendly playbook index for OpenTelemetry blog content that is relevant to this skill. The goal is not to retell one company story in detail. The goal is to help the skill map user questions to the most relevant upstream operating patterns, then load the right deep-dive references from this repository.
The 2025 Developer Experience SIG survey explicitly called out the need for better production examples, debugging guidance, and more concrete deployment guidance. This document turns that need into a scalable maintenance model for future blog routing.
Use this document when a user asks for:
These playbooks are not meant to be copied verbatim. They should be used to answer questions like:
For each routed playbook, load deeper reference material from this repository as needed:
As more OpenTelemetry.io blog posts are integrated, keep each playbook entry in this shape:
This structure keeps the skill generic. It routes by user intent and technical problem space, not by a specific company name.
These are the most relevant recent 2025 and early-2026 opentelemetry.io blog
posts to route through this skill today. The list is intentionally
topic-driven and open-ended so future entries can be added without
restructuring the
document.
| Blog | Primary routing signals | Why it matters for the skill | Load next |
|---|---|---|---|
| Kubernetes annotation-based discovery for the OpenTelemetry Collector | receiver_creator, annotation-based discovery, Kubernetes self-service scraping, pod annotations | Strong playbook for self-service Collector onboarding with platform safety rails | collector, platforms |
| Observing Lambdas using the OpenTelemetry Collector Extension Layer | Lambda, serverless, extension layer, decouple processor, delayed export | Covers ephemeral runtime constraints and decoupled export patterns | platforms, collector, monitoring |
| Exposing OTel Collector in Kubernetes with Gateway API & mTLS | Gateway API, mTLS, external OTLP ingress, multi-cluster collector, hybrid cloud | Practical security and ingress pattern for centralized collector deployments | security, architecture, collector |
| How Mastodon Runs OpenTelemetry Collectors in Production | small team, one collector per namespace, OpenTelemetry Operator, Argo CD, tail sampling, vendor-neutral observability | Strong operating model for keeping collector deployments simple, declarative, and reliable while preserving backend choice at production scale | architecture, collector, monitoring |
| OpenTelemetry Profiles Enters Public Alpha | profiles, profiling, OTLP Profiles, eBPF profiler, pprof receiver, profile correlation | Good routing target when users ask how continuous profiling fits into OpenTelemetry, especially around collector support and cross-signal correlation | collector, platforms, monitoring |
| Demystifying Automatic Instrumentation: How the Magic Actually Works | auto-instrumentation, zero-code, bytecode instrumentation, eBPF, runtime hooks | Helps the skill explain which automatic instrumentation mechanism fits a runtime | instrumentation, platforms |
| OpenTelemetry Logging and You | logs, events, Logs API, log bridges, signal correlation | Useful when users ask how logs relate to traces and metrics in OTel's model | instrumentation, collector |
| How to Name Your Spans | span naming, low cardinality, semantic conventions, business spans | Good routing target for custom instrumentation and naming guidance | instrumentation |
| How to Name Your Span Attributes | attribute naming, semantic conventions, custom attributes, reserved namespaces | Helps the skill answer detailed questions about attribute design and stability | instrumentation |
| How to Name Your Metrics | metric naming, units, metric cardinality, service.name, semantic conventions | Important for metric schema hygiene and cross-service aggregation advice | instrumentation, monitoring |
| OpenTelemetry Sampling update | consistent sampling, TraceState, probability sampling, W3C TraceContext | Strong route for advanced sampling questions beyond basic head vs tail framing | sampling |
| The Declarative configuration journey: Why it took 5 years to ignore health check endpoints in tracing | declarative config, config file, health check exclusion, Java agent config | Good route for questions about portable config, rule-based routing, and YAML-first OTel setup | instrumentation, sampling |
| OTTL contexts just got easier with context inference | OTTL, transform processor, context inference, Collector transforms | Useful when users need simpler transform-processor guidance and want to avoid manual context selection mistakes | collector, connectors |
| Announcing Support for Complex Attribute Types in OTel | complex attributes, maps, heterogeneous arrays, structured telemetry | Helps the skill answer when complex payloads belong in attributes and when flat attributes remain the better design | instrumentation |
| Announcing the Beta Release of OpenTelemetry Go Auto-Instrumentation using eBPF | Go auto-instrumentation, eBPF, runtime hooks, zero-code Go | Adds a concrete runtime-specific route for Go users beyond generic auto-instrumentation explanations | instrumentation, platforms |
| Alibaba, Datadog, and Quesma Join Forces on Go Compile-Time Instrumentation | Go compile-time instrumentation, toolexec, zero-code Go, build-time instrumentation | Good route when users compare compile-time instrumentation with eBPF or manual Go instrumentation | instrumentation |
| Announcing the RPC Semantic Conventions stabilization project | RPC semantic conventions, gRPC telemetry, convention migration, stabilization | Useful for questions about RPC naming, migration windows, and convention stability expectations | instrumentation |
| Contributing the Unroll Processor to the OpenTelemetry Collector Contrib | unroll processor, bundled logs, record expansion, transform vs purpose-built processor | Adds a routing path for log-pipeline questions where bundled payload expansion should not be forced into OTTL transforms | collector, monitoring |
| How Mastodon Runs OpenTelemetry Collectors in Production | small team, Operator-managed collectors, one collector per namespace, Datadog connector, tail sampling in production | Strong production routing example for keeping collector architecture simple, using the OpenTelemetry Operator for lifecycle, and controlling volume with aggressive error-first sampling | architecture, collector, sampling |
| OpenTelemetry Profiles Enters Public Alpha | profiles, continuous profiling, eBPF profiler, pprof receiver, profile signal | Useful when users ask about bringing profiling into an OTel pipeline; it sets the right expectation that Profiles are practical to evaluate but still Alpha for critical production workloads | collector, platforms |
These patterns are intentionally generic so the skill can scale as more blogs are added.
The skill should match on the user's technical goal—such as Lambda export, secure collector ingress, or naming guidance—not on a company name from a blog post.
Good playbooks let application teams opt in through narrow, well-defined interfaces while the platform retains the right guardrails.
For spans, attributes, and metrics, prefer low-cardinality names and put varying context in attributes or resource metadata.
If telemetry crosses clusters, networks, or trust domains, route to patterns that include explicit authentication, encryption, and ownership boundaries.
Ephemeral runtimes like Lambda need different collector and export patterns than long-running Kubernetes workloads.
"Auto-instrumentation" is not a single implementation strategy. The right mechanism depends on runtime behavior, deployment model, and operational constraints.
As OTel setups grow, YAML-first or schema-driven configuration becomes easier to review, reuse, and scale than scattered ad hoc flags.
A blog route should be the front door. The implementation details should still come from the local references in this repository.
This makes the skill brittle and limits reuse as more upstream blogs are added.
Different runtimes use different mechanisms with different trade-offs.
That breaks aggregation, increases cardinality, and makes dashboards harder to reuse.
External OTLP ingress should be treated as a security-sensitive boundary.
Serverless systems need export paths that respect execution and billing limits.
As environments scale, declarative and shared configuration becomes more maintainable.
Some user questions require consistent probability sampling and TraceState-aware explanations.
✅ Keep production playbooks generic, reusable, and routing-friendly ✅ Use an expandable 2025-2026 blog routing scan instead of centering the document on one org ✅ Route by technical problem space such as serverless, ingress, logs, metrics, naming, transforms, and sampling ✅ Treat blog posts as entry points and local references as the detailed implementation guides ⚠️ Avoid coupling the skill to company-specific narratives when the same pattern can be expressed generically ⚠️ Keep expanding this index as new upstream blog posts become relevant to the skill
docs
evals
cardinality-protection
claude-code-telemetry
collector-memory-limiter
scenario-1
scenario-2
scenario-3
scenario-4
tail-sampling-setup
references