CtrlK
BlogDocsLog inGet started
Tessl Logo

o11y-dev/opentelemetry-skill

Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).

93

7.08x
Quality

97%

Does it follow best practices?

Impact

85%

7.08x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

README.md

OpenTelemetry Skill: A Cognitive Architecture for AI-Assisted Observability Engineering

License OpenTelemetry Tessl Registry

Overview

The opentelemetry-skill is an AI assistant skill designed to help with OpenTelemetry configuration and observability engineering tasks. This skill employs progressive disclosure to optimize context usage and deliver production-ready OpenTelemetry configurations.

This repository contains the source code for the OpenTelemetry Skill tile released by Tessl.

Key Features

Comprehensive Coverage: Specialized reference docs covering collector architecture, security, sampling, AI agents, and compatibility

Production Focus: Emphasizes stability, security, and cost optimization patterns

AI Agent Support: Configuration guidance for monitoring AI coding agents alongside traditional applications

Progressive Loading: Context-aware reference loading prevents information overload

Continuous Updates: Automated upstream monitoring tracks OpenTelemetry releases and AI agent repositories

📋 Table of Contents

What Makes This Different?

Unlike loading the entire OpenTelemetry documentation into an AI's context (which leads to hallucinations and information overload), this skill acts as a cognitive router:

  1. System 2 Thinking: Forces the AI to analyze critical observability signals (throughput, cardinality, resiliency) before generating code
  2. Progressive Disclosure: Loads detailed reference materials only when specific topics are triggered
  3. Production-First: Prioritizes stability, security, and cost optimization over feature completeness
  4. Convention Enforcement: Ensures semantic conventions, proper processor ordering, and architectural best practices
  5. AI Agent Support: Includes guidance for observing AI coding agents in production environments

Core Features

  • 🧠 Cognitive Architecture: Meta-knowledge layer that teaches AI how to think about observability
  • 📊 Cardinality Management: Built-in guards against metric explosion and cost overruns
  • 🏗️ Deployment Patterns: DaemonSet vs Gateway vs Sidecar decision matrices for Kubernetes
  • 🔒 Security by Default: PII redaction, TLS, and authentication patterns
  • 🔄 OTTL Transformations: Comprehensive OpenTelemetry Transformation Language guidance with patterns and best practices
  • 📈 Scaling Strategies: Load balancing with sticky sessions for tail sampling
  • 🎯 Sampling Intelligence: Head vs tail sampling with statistical trade-off analysis
  • 🔍 Meta-Monitoring: Self-observability patterns for collector health
  • 🤖 AI Agent Observability: Configuration guides for monitoring AI coding agents including Claude Code, Gemini CLI, GitHub Copilot, Codex CLI, Qwen Code, Pi Agent, and more via OpenTelemetry
  • Test & Validation Framework: TDD-based testing methodology to ensure skill effectiveness

Skill Structure

SKILL.md acts as the cognitive router — a compact instruction set that tells the AI how to reason about observability before generating any output. docs/index.md is the tile's on-demand documentation entrypoint for Tessl, and references/ contains the deep-dive documents that the skill links to when specific topics are triggered.

📊 Content Overview

  • Packaged reference docs for architecture, collector design, instrumentation, security, sampling, AI agents, and compatibility
  • AI coding agent coverage tracked with upstream monitoring
  • Production-tested configurations with validation commands
  • Current & updated - automatically synced with latest OpenTelemetry releases

Installation

skills.sh

Install this skill with the skills.sh CLI:

npx skills add o11y-dev/opentelemetry-skill

Tessl Registry

Install this tile from the Tessl registry (workspace: o11y-dev):

tessl tile install o11y-dev/opentelemetry-skill

GitHub Copilot

Attach SKILL.md as a custom instructions file, or reference the repository as a Copilot Skill in your Copilot settings: https://github.com/o11y-dev/opentelemetry-skill

Claude

Add SKILL.md to your project knowledge or paste it into your system prompt.

Cursor

Plugin manifests are available in .cursor-plugin/ for use with the Cursor marketplace.

Other AI Systems

Point your agent at SKILL.md as the primary instruction set, with references/ available for context loading.

Architecture

opentelemetry-skill/
├── .claude-plugin/
│   └── marketplace.json      # Plugin metadata
├── .cursor-plugin/
│   ├── marketplace.json      # Cursor marketplace metadata
│   └── plugin.json           # Cursor plugin manifest
├── docs/
│   └── index.md              # Tessl docs entrypoint for bundled references and eval assets
├── SKILL.md                  # Cognitive router (the "brain")
├── README.md                 # This file
├── references/
│   ├── ai-agents.md          # AI agent observability patterns & configurations
│   ├── architecture.md       # Deployment patterns & scaling
│   ├── compatibility.md      # Version-sensitive support and compatibility notes
│   ├── collector.md          # Pipeline configuration & components
│   ├── instrumentation.md    # SDKs & semantic conventions
│   ├── sampling.md           # Sampling strategies
│   ├── security.md           # PII redaction & authentication
│   └── monitoring.md         # Self-monitoring patterns
└── LICENSE                   # Apache 2.0

Architecture Patterns

CategoryPatternDescription
KubernetesDaemonSet / Gateway / SidecarChoose based on workload type and data volume
ServerlessFaaS Extension LayerLambda, Azure Functions, GCP with non-blocking export
SamplingHead / Tail SamplingTrade-off between cost and completeness
SecuritymTLS + RBACSecure cross-network telemetry pipelines
AI AgentsAgent TelemetryMonitor coding agents as first-class services in your observability stack

Usage Examples

Example 1: Gateway Deployment for Tail Sampling

User: "I need to deploy an OpenTelemetry gateway in Kubernetes for tail sampling."

AI Response (leveraging the skill):

  • ✅ Asks about throughput to size replicas
  • ✅ Loads references/architecture.md and references/sampling.md
  • ✅ Generates Deployment with loadbalancing exporter (routing_key: traceID)
  • ✅ Includes Headless Service for sticky sessions
  • ✅ Configures tail_sampling processor with error/latency policies
  • ✅ Warns about Beta stability level
  • ✅ Provides validation commands

Example 2: Preventing Data Loss

User: "Ensure we don't lose telemetry data if the backend goes down."

AI Response:

  • ✅ Loads references/collector.md
  • ✅ Configures file_storage extension
  • ✅ Attaches persistent queues to exporters
  • ✅ Explains disk space requirements
  • ✅ Provides PersistentVolume YAML for Kubernetes

Example 3: Cardinality Guard

User: "Add user_id as a metric dimension."

AI Response:

  • Blocks the request: "user_id violates the Rule of 100—it's unbounded and will cause metric explosion."
  • ✅ Suggests alternative: "Use user_id as a trace attribute instead, and create a metric for aggregated user counts."
  • ✅ Loads references/instrumentation.md to explain cardinality management

See SKILL.md for the full list of progressive disclosure triggers, System 2 thinking signals, core principles, and production-ready configuration defaults.

Reference Documentation

Deep-dive guides are available in the references/ directory:

  • ai-agents.md: AI agent observability patterns, per-agent setup guidance, dashboards, and operational caveats
  • architecture.md: Deployment patterns, load balancing, Target Allocator
  • collector.md: Pipeline anatomy, processor ordering, memory management
  • instrumentation.md: SDKs, semantic conventions, cardinality management
  • ottl.md: OpenTelemetry Transformation Language syntax, functions, patterns, and best practices
  • platforms.md: FaaS (Lambda, Azure, GCP), client-side apps, serverless best practices
  • sampling.md: Head vs tail, probabilistic strategies, sticky sessions
  • security.md: PII redaction, TLS, extension security
  • monitoring.md: Collector metrics, dashboards, alerts
  • playbooks.md: Reusable production playbooks distilled from OpenTelemetry blog posts and real-world deployment stories

Contrib Components & Example Configs

The OpenTelemetry Collector Contrib repository contains extended components and curated example configurations. Always verify component stability and pin to released versions (e.g., v0.100.0+) instead of main.

Stability & Registry

  • VERSIONING.md: Component stability matrix (Stable/Beta/Alpha/Development)

Component Directories

Key Components (Production-Ready)

Example Configurations

  • examples/: Curated collector configurations
    • Gateway deployments with tail sampling
    • Agent/DaemonSet configurations for Kubernetes
    • Logging and filelog receiver examples
    • Kubernetes attribute enrichment patterns

Best Practice: Always pin to released tags matching your collector version (e.g., v0.100.0+) instead of using main branch for production stability.

Testing & Validation

This skill includes a comprehensive test and validation framework following TDD (Test-Driven Development) principles:

The testing framework validates that the skill actually changes AI behavior and doesn't allow common anti-patterns. GitHub Actions automatically validates skill structure and content on every change, and the Tessl report workflow posts best-practice review feedback on every pull request.

An additional GitHub Agentic Workflow (.github/workflows/otel-upstream-maintenance.yml) runs weekly to create an upstream maintenance digest issue with recent OpenTelemetry GitHub issues, releases, and blog/community updates for practical repository refreshes.

Contributing

This skill is designed to evolve with the OpenTelemetry ecosystem. Contributions are welcome:

  1. Update Reference Docs: As OTel features stabilize, update stability warnings
  2. Add Patterns: New deployment architectures (e.g., eBPF-based collection)
  3. Expand Examples: Language-specific SDK patterns
  4. Improve Triggers: Refine the progressive disclosure logic

Known Limitations

  • AI agent trace coverage varies: Claude Code does not emit traces natively; observability relies on opentelemetry-hooks or native logs/metrics. Each agent has different signal coverage.
  • Tail sampling memory: Scales with in-flight trace count. Beyond 10k RPS, consider tiered architecture (Agent -> Gateway -> Analysis) rather than single-collector tail sampling.
  • OTTL regex transforms: Can impact p99 latency at high span volume. Profile with production traffic before deploying regex-heavy transformations.
  • Semantic conventions are evolving: The gen_ai.* namespace is experimental. Attribute names may change in future OpenTelemetry releases.
  • Kubernetes version requirements: Native sidecar container support requires v1.24+. Earlier versions need traditional sidecar patterns.

Roadmap

  • Expand AI agent observability coverage as new agents ship native telemetry (Qwen Code, Windsurf, Zed)
  • Track OpenTelemetry semantic convention releases for gen_ai namespace stabilization
  • Add cost optimization patterns for high-volume agent deployments
  • Expand production playbook coverage with new upstream blog posts
  • Add eBPF-based collection patterns for auto-instrumentation
  • Collector processor stability matrix tracking across releases

Compatibility

Compatibility details move faster than the cognitive-router guidance in SKILL.md. See references/compatibility.md for the current version floors and AI agent support notes.

License

This skill is licensed under the Apache License 2.0. See LICENSE for details.

The OpenTelemetry project itself is a CNCF project licensed under Apache 2.0.

Acknowledgments

  • OpenTelemetry Community: For building the foundational observability standard
  • monitoringartist: For the collector monitoring dashboards and patterns

Related Projects


Transform your AI into an observability-focused assistant. Production-ready. AI-agent aware.
Deploy with confidence. Observe with precision.

CHANGELOG.md

CONTRIBUTING.md

README.md

SKILL.md

tessl.json

tile.json