Expert OpenTelemetry guidance for collector configuration, pipeline design, and production telemetry instrumentation. Use when configuring collectors, designing pipelines, instrumenting applications, implementing sampling, managing cardinality, securing telemetry, writing OTTL transformations, or setting up AI coding agent observability (Claude Code, Codex, Gemini CLI, GitHub Copilot).
93
97%
Does it follow best practices?
Impact
85%
7.08xAverage score across 4 eval scenarios
Passed
No known issues
A comprehensive guide to monitoring AI coding agents (Claude Code, Gemini CLI, GitHub Copilot, Codex CLI, and others) via OpenTelemetry.
<!-- UPSTREAM MONITORING NOTE: This file is automatically flagged for review when changes occur in: - GitHub repositories: github/copilot-cli, Aider-AI/aider, openai/codex, google-gemini/gemini-cli, anthropics/claude-code, anthropics/skills, QwenLM/qwen-code, microsoft/vscode-copilot-chat, anysphere/cursor-wiki, anomalyco/opencode, DEVtheOPS/opencode-plugin-otel, badlogic/pi-mono - OpenTelemetry semantic conventions: open-telemetry/semantic-conventions (gen-ai model) - Manual monitoring recommended for official docs: docs.github.com/copilot/, aider.chat/docs/, developers.openai.com/codex/, google-gemini.github.io/gemini-cli/, claude.ai/code/, qwenlm.github.io/qwen-code-docs/, cursor.com, pi.dev -->| Agent | Vendor | Native OTel | Traces | Metrics | Logs/Events | GenAI SemConv | Hooks Support | Config Method | Config File / Env Vars | Protocol | Official Docs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude Code | Anthropic | ⚠️ metrics/logs only | ❌ | ✅ | ✅ | ❌ (custom claude_code.*) | ✅ governance wrapper | Env vars or ~/.claude/settings.json | CLAUDE_CODE_ENABLE_TELEMETRY, OTEL_* | OTLP gRPC/HTTP | docs |
| Gemini CLI | ✅ full | ✅ | ✅ | ✅ | ✅ (gen_ai.*) | ✅ governance wrapper | .gemini/settings.json or env vars | GEMINI_TELEMETRY_* | OTLP gRPC | docs | |
| GitHub Copilot VS Code | Microsoft | ✅ full | ✅ | ✅ | ✅ | ✅ (gen_ai.*) | ⚠️ launcher wrapper only | VS Code settings.json or env var | COPILOT_OTEL_ENABLED | OTLP HTTP | docs |
| GitHub Copilot CLI | Microsoft | ✅ full | ✅ | ✅ | ✅ | ✅ (gen_ai.*) | ✅ governance wrapper | Same span model as VS Code | COPILOT_OTEL_ENABLED | OTLP HTTP | docs |
| OpenAI Codex CLI | OpenAI | ⚠️ partial | ⚠️ interactive only | ⚠️ interactive only | ✅ | ❌ (custom event names) | ✅ gap-filler + governance | ~/.codex/config.toml [otel] section | ~/.codex/config.toml | OTLP gRPC | docs |
| Qwen Code | Alibaba | 🔜 planned | 🔜 planned | 🔜 planned | 🔜 planned | 🔜 planned | ✅ interim bridge | .qwen/settings.json | .qwen/settings.json | OTLP | docs |
| OpenCode | Anomaly | ❌ none | ❌ | ❌ | ❌ | ❌ | ✅ primary | Community plugin only | n/a | n/a | plugin |
| Pi Agent | open-source | ❌ none | ❌ | ❌ | ⚠️ install telemetry only | ❌ | ✅ primary | ~/.pi/agent/settings.json or .pi/settings.json | PI_TELEMETRY, enableInstallTelemetry | n/a | docs |
| Cursor | Anysphere | ❌ none | ❌ | ❌ | ❌ | ❌ | ⚠️ launcher wrapper only | Via MCP servers only | n/a | n/a | — |
| Windsurf | Cognition | ❌ none | ❌ | ❌ | ❌ | ❌ | ⚠️ launcher wrapper only | Agent skills for user code only | n/a | n/a | — |
| Amazon Q Developer | AWS | ❌ OTLP | ❌ | ❌ | ❌ | ❌ | ✅ primary | CloudWatch/CloudTrail only | n/a | n/a | — |
| Aider | open-source | ❌ none | ❌ | ❌ | ❌ | ❌ | ✅ primary | External wrapper only | n/a | n/a | — |
Claude Code emits metrics and logs/events only — no traces. Telemetry is opt-in.
Minimum config (env vars):
export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317Persistent config (~/.claude/settings.json):
{
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "1",
"OTEL_METRICS_EXPORTER": "otlp",
"OTEL_LOGS_EXPORTER": "otlp",
"OTEL_EXPORTER_OTLP_PROTOCOL": "grpc",
"OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4317",
"OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE": "cumulative"
}
}Privacy controls:
| Env Var | Default | Effect |
|---|---|---|
OTEL_LOG_USER_PROMPTS | false | Includes raw user prompts in log events |
OTEL_LOG_TOOL_DETAILS | false | Includes tool call parameters in logs |
OTEL_METRICS_INCLUDE_SESSION_ID | false | Adds session.id as metric dimension (⚠️ high cardinality) |
⚠️ Temporality: Claude Code emits cumulative metrics. Set
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulativeto match. VictoriaMetrics and some Prometheus backends will silently drop delta-converted metrics from cumulative sources.
Gemini CLI emits full traces + metrics + logs using GenAI semantic conventions (gen_ai.*).
Config file (.gemini/settings.json):
{
"telemetry": {
"enabled": true,
"otlpEndpoint": "http://localhost:4317",
"otlpProtocol": "grpc",
"logPrompts": false
}
}Env var override:
export GEMINI_TELEMETRY_ENABLED=true
export GEMINI_TELEMETRY_OTLP_ENDPOINT=http://localhost:4317✅ Gemini CLI v0.34.0+ follows
gen_ai.*GenAI semantic conventions. Traces include full span hierarchy for multi-step agent operations.
VS Code settings.json:
{
"github.copilot.chat.otel.enabled": true,
"github.copilot.chat.otel.otlpEndpoint": "http://localhost:4318",
"github.copilot.chat.otel.exporterType": "otlp-http",
"github.copilot.chat.otel.captureContent": false
}Env var alternative:
export COPILOT_OTEL_ENABLED=true
export COPILOT_OTEL_OTLP_ENDPOINT=http://localhost:4318⚠️
captureContent: truecaptures full prompts and responses. Keep thisfalsein shared or production environments. See Privacy section.
Copilot CLI shares the same span model as the VS Code extension. Uses OTLP HTTP by default.
export COPILOT_OTEL_ENABLED=true
export COPILOT_OTEL_OTLP_ENDPOINT=http://localhost:4318Codex CLI supports telemetry in interactive mode only. codex exec and codex mcp-server have known gaps (see Known Gaps).
Config file (~/.codex/config.toml):
[otel]
exporter = { otlp-grpc = { endpoint = "http://localhost:4317" } }
log_user_prompt = falseMinimum config only:
[otel]
exporter = { otlp-grpc = { endpoint = "http://localhost:4317" } }⚠️ Codex v0.105.0+ is required.
codex execdrops metrics entirely.codex mcp-serverhas zero OTel support. See open issue #12913.
Docs describe a telemetry system with .qwen/settings.json, but the corresponding code has not shipped as of 2026-03. Monitor the Qwen Code telemetry docs for updates.
Planned config (.qwen/settings.json):
{
"telemetry": {
"enabled": true,
"otlpEndpoint": "http://localhost:4317"
}
}Use opentelemetry-hooks as a hook-based instrumentation layer around an agent invocation (typically a CLI entrypoint). Hooks serve three practical roles: a primary instrumentation path for agents with no native OpenTelemetry, a gap-filler for agents with partial native coverage, and an outer governance/control wrapper for agents that already emit telemetry but still need standardized invocation-level controls. Because hooks sit outside the agent process, they can standardize process-level telemetry and enforcement across heterogeneous agents without modifying the agent binary.
Scope: opentelemetry-hooks instruments the wrapped process invocation. For fully CLI-based agents (OpenCode, Aider, Amazon Q Developer CLI) this captures each agent run end-to-end. For GUI-first editors (Cursor, Windsurf) wrapping the launch command provides limited value because the main agent activity occurs inside the desktop process after startup; only the launch duration and exit code are reliably captured. Use the hooks approach for Cursor/Windsurf only if you have a headless/CLI agent invocation (for example
cursor --headlessor a Windsurf CLI subcommand).
Quick start with opentelemetry-hooks:
# Install
pip install opentelemetry-hooks
# Wrap CLI-based agents (full coverage)
otel-hooks --service-name aider --otlp-endpoint http://localhost:4317 -- aider <args>
otel-hooks --service-name opencode --otlp-endpoint http://localhost:4317 -- opencode <args>
# Wrap GUI-based agents (launch/exit coverage only)
otel-hooks --service-name cursor --otlp-endpoint http://localhost:4317 -- cursor <args>What opentelemetry-hooks captures:
| Signal | Details |
|---|---|
| Spans | Start/end per invocation, child spans for subprocesses |
| Metrics | Wall-clock duration, exit code, process CPU/memory |
| Logs | stdout/stderr lines as log records with severity |
Privacy warning: Capturing stdout/stderr as logs can include prompts, source code, configuration, secrets (for example, API keys or tokens), and other sensitive data. Before enabling this, review your data-handling requirements and configure your OpenTelemetry pipeline or
opentelemetry-hooksto disable or redact stdout/stderr capture where needed (for example, via log filtering/redaction or by turning off log export). See §6. Privacy & Cardinality Considerations for guidance.
| Agent | Native OTel | Hooks Role | Recommended Usage |
|---|---|---|---|
| Claude Code | ⚠️ metrics/logs only | Governance wrapper | Keep native metrics/logs enabled; add hooks when you need standardized start/stop audit events, resource attributes, or launch-time controls across agents. |
| Gemini CLI | ✅ full | Governance wrapper | Prefer native telemetry for traces and GenAI semantics; add hooks only for organization-wide process-boundary controls or uniform invocation audit events. |
| GitHub Copilot CLI | ✅ full | Governance wrapper | Use native telemetry for primary observability; add hooks when you need consistent launch policies, ownership tags, or process-boundary audit signals across multiple CLI agents. |
| GitHub Copilot VS Code | ✅ full | Limited launcher wrapper | Prefer native telemetry. Hooks can wrap the editor launch, but they provide only outer-process coverage because most agent activity occurs inside the desktop process after startup. |
| OpenAI Codex CLI | ⚠️ partial | Gap-filler + governance | Use native OTel where available, especially interactive mode. Add hooks to cover outer invocation telemetry, standardize controls, and partially bridge exec/mcp-server gaps. |
| Qwen Code | 🔜 planned | Primary until native ships | Treat hooks as an interim process-level bridge while the documented native telemetry remains unshipped. Move to native telemetry once the implementation is verifiable. |
| OpenCode | ❌ none | Primary | Use opentelemetry-hooks as the primary instrumentation path; community plugin: opencode-plugin-otel is an additional fallback. Feature request: #14697. |
| Cursor | ❌ none | Limited launcher wrapper | Wrap only when you have a headless/CLI invocation. For the desktop app, hooks provide launch/exit coverage only; MCP servers instrument user code, not Cursor itself. |
| Windsurf | ❌ none | Limited launcher wrapper | Wrap only CLI/headless entrypoints. For the desktop app, hooks provide launch/exit coverage only; Windsurf agent skills can instrument user code but not Windsurf itself. |
| Amazon Q Developer | ❌ no OTLP | Primary | Native signals are CloudWatch/CloudTrail-oriented rather than OTLP. For process-level OTLP spans, metrics, and logs from the Q Developer CLI process, wrap it with hooks. |
| Aider | ❌ none | Primary | Use opentelemetry-hooks as the primary process-level instrumentation path instead of a custom shell-script wrapper. |
Even when native OpenTelemetry exists, hooks are useful above the agent as a lightweight control layer. Use them to attach standard resource attributes across all agents, enforce required environment/config before invocation, emit uniform start/stop audit events, apply pre-export filtering or redaction to stdout/stderr-derived logs, and add consistent ownership, cost-center, or environment tags. This creates organization-wide boundaries and policies that are independent of any single vendor's telemetry maturity.
⚠️ Hooks provide process-level instrumentation only. They complement native telemetry, but they do not replace in-process agent signals such as token counts, model metadata, internal tool-call spans, or semantic-convention-rich events emitted by the agent itself.
A single OTel Collector instance can receive telemetry from all agents simultaneously on standard OTLP ports. Prefer OTLP gRPC end-to-end when agents and backends support it; keep OTLP HTTP enabled where an agent, managed ingress, or backend only exposes HTTP or gRPC is not possible.
# otel-collector-ai-agents.yaml
# Production-ready config for multi-agent AI coding observability
# Tested with OTel Collector v0.150.0+
extensions:
health_check:
endpoint: localhost:13133
file_storage:
directory: /var/lib/otelcol/filestore
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # Preferred OTLP receiver: Claude Code, Gemini CLI, Codex CLI
http:
endpoint: 0.0.0.0:4318 # HTTP fallback/interop: GitHub Copilot VS Code/CLI and HTTP-only clients
processors:
# CRITICAL: memory_limiter MUST be first processor in every pipeline
memory_limiter:
check_interval: 1s
limit_percentage: 80
spike_limit_percentage: 20
# Normalize service.name across all agents
resource:
attributes:
- key: service.name
action: upsert
from_attribute: service.name
# Tag all AI agent telemetry for easy filtering
- key: telemetry.source.type
value: ai-coding-agent
action: insert
# Map custom claude_code.* prefixes to gen_ai.* where semantically equivalent
transform/normalize_agent_metrics:
metric_statements:
- context: datapoint
statements:
# Claude Code uses claude_code.* prefix — surface agent name for dashboards
- set(attributes["gen_ai.system"], "claude_code") where resource.attributes["service.name"] == "claude_code"
- set(attributes["gen_ai.system"], "gemini_cli") where resource.attributes["service.name"] == "gemini_cli"
log_statements:
- context: log
statements:
# Normalize agent identifier in log body for cross-agent queries
- set(attributes["gen_ai.system"], "claude_code") where resource.attributes["service.name"] == "claude_code"
# Redact secrets from tool_parameters (reuse security.md pattern)
transform/redact_secrets:
log_statements:
- context: log
statements:
- replace_pattern(attributes["tool.parameters"], "(?i)(api[_-]?key|secret|token|password)[\"'\\s]*[:=][\"'\\s]*[^\\s,}]+", "REDACTED")
batch:
timeout: 10s
send_batch_size: 1024
exporters:
# Metrics → Prometheus (scraped by Grafana)
prometheus:
endpoint: 0.0.0.0:8889
namespace: ai_agent
resource_to_telemetry_conversion:
enabled: true
# OTLP HTTP exporter example — use when the backend or ingress only accepts OTLP HTTP
otlphttp/loki:
endpoint: http://loki:3100/otlp
sending_queue:
enabled: true
storage: file_storage
retry_on_failure:
enabled: true
# Preferred OTLP gRPC exporter example
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
sending_queue:
enabled: true
storage: file_storage
retry_on_failure:
enabled: true
service:
extensions: [health_check, file_storage]
pipelines:
# Metrics pipeline — all agents
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, transform/normalize_agent_metrics, batch]
exporters: [prometheus]
# Logs/Events pipeline — all agents
logs:
receivers: [otlp]
processors: [memory_limiter, resource, transform/normalize_agent_metrics, transform/redact_secrets, batch]
exporters: [otlphttp/loki]
# Traces pipeline — Gemini CLI, Copilot only (others emit nothing here)
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp/tempo]Protocol choice: Prefer OTLP gRPC on 4317 for both receivers and exporters. Keep OTLP HTTP on 4318 available for agents like GitHub Copilot and for backends, proxies, or managed ingest endpoints where gRPC is unavailable.
Processor ordering:
memory_limiteris always first. Theresourceprocessor runs beforetransformso enriched attributes are available for OTTL statements.batchis always last before exporters.
| Agent | Metric Name | Type | Unit | Key Attributes |
|---|---|---|---|---|
| Claude Code | claude_code.tokens.input | Counter | {token} | model, session.id |
| Claude Code | claude_code.tokens.output | Counter | {token} | model, session.id |
| Claude Code | claude_code.cost.usd | Counter | USD | model |
| Claude Code | claude_code.api.request.duration | Histogram | ms | model, status |
| Claude Code | claude_code.tool.call.count | Counter | {call} | tool.name, status |
| Claude Code | claude_code.cache.read.tokens | Counter | {token} | model |
| Gemini CLI | gen_ai.client.token.usage | Counter | {token} | gen_ai.system, gen_ai.token.type, gen_ai.operation.name |
| Gemini CLI | gen_ai.client.operation.duration | Histogram | s | gen_ai.system, gen_ai.operation.name, gen_ai.response.finish_reason |
| GitHub Copilot | gen_ai.client.token.usage | Counter | {token} | gen_ai.system, gen_ai.token.type |
| GitHub Copilot | gen_ai.client.operation.duration | Histogram | s | gen_ai.system, gen_ai.operation.name |
| Codex CLI | codex.tokens.used | Counter | {token} | model, direction |
| Codex CLI | codex.request.latency | Histogram | ms | model, status |
⚠️ Dashboard for evolving
gen_ai.token.typevalues. Do not assume GenAI token metrics are permanently limited toinputandoutput. Newer semantic-convention work is adding finer-grained categories such as cache and reasoning tokens. Build charts and cost rollups so unknown token types are grouped, not discarded.
SemConv v1.40.0 review: Preserve gen_ai.agent.version, gen_ai.usage.cache_read.input_tokens, and gen_ai.usage.cache_creation.input_tokens when agents emit them. These attributes help distinguish agent releases and cached-token behavior without collapsing everything back into a fixed input/output schema.
| Agent | Event Name | Key Attributes | Correlation ID Field |
|---|---|---|---|
| Claude Code | gen_ai.user.message | gen_ai.system, session.id, prompt.id | prompt.id |
| Claude Code | gen_ai.assistant.message | gen_ai.system, session.id, prompt.id, model | prompt.id |
| Claude Code | gen_ai.tool.message | tool.name, session.id, prompt.id | prompt.id |
| Claude Code | claude_code.api.request | model, prompt.id, input_tokens, output_tokens, cost_usd | prompt.id |
| Gemini CLI | gen_ai.user.message | gen_ai.system, gen_ai.conversation.id | gen_ai.conversation.id |
| Gemini CLI | gen_ai.assistant.message | gen_ai.system, gen_ai.conversation.id, gen_ai.response.model | gen_ai.conversation.id |
| GitHub Copilot | gen_ai.user.message | gen_ai.system, gen_ai.thread.id | gen_ai.thread.id |
| GitHub Copilot | gen_ai.choice | gen_ai.system, gen_ai.response.finish_reason | gen_ai.thread.id |
| Codex CLI | codex.session.start | session.id, model, working_dir | session.id |
| Codex CLI | codex.session.end | session.id, total_tokens, total_cost_usd | session.id |
| Agent | Span Name | Kind | Key Attributes | Child Spans |
|---|---|---|---|---|
| Gemini CLI | gen_ai.chat | CLIENT | gen_ai.system, gen_ai.operation.name, gen_ai.request.model | tool call spans |
| Gemini CLI | execute_tool | INTERNAL | gen_ai.tool.name, gen_ai.tool.call.id | none |
| GitHub Copilot | gen_ai.chat | CLIENT | gen_ai.system, gen_ai.operation.name | completion spans |
| GitHub Copilot | gen_ai.completion | INTERNAL | gen_ai.response.finish_reason, gen_ai.usage.input_tokens | none |
Note: Claude Code emits no traces. Use
prompt.idcorrelation across log events as a pseudo-trace (see Known Gaps).
| Dashboard | Agents Covered | Stack | Link |
|---|---|---|---|
| ai-observer | Claude Code + Gemini CLI + Codex CLI | Any OTLP backend | github.com/tobilg/ai-observer |
| claude-code-otel | Claude Code | Grafana + Prometheus | github.com/ColeMurray/claude-code-otel |
| Honeycomb Claude Code template | Claude Code | Honeycomb | Built-in board template (search "Claude Code" in Honeycomb) |
| Gemini CLI GCP Monitoring | Gemini CLI | GCP Monitoring | Pre-configured template in GCP Console |
Build these panels for a team-facing AI agent observability dashboard:
Token usage by agent/user/model over time
claude_code.tokens.input + claude_code.tokens.output (Claude Code); gen_ai.client.token.usage (Gemini, Copilot)model, gen_ai.system (NOT session.id — high cardinality)Cost breakdown by agent and model
claude_code.cost.usd (Claude Code); derived from token counts × model pricing for othersgen_ai.system, modelAPI request latency (p50/p95/p99)
claude_code.api.request.duration (Claude Code); gen_ai.client.operation.duration (GenAI SemConv agents)Tool call success/failure rates
claude_code.tool.call.count with status dimensiongen_ai.tool.message events by statusActive sessions / DAU/WAU/MAU
session.id (count distinct via log query, not metric dimension)Cache hit ratio (Claude Code)
claude_code.cache.read.tokens / (claude_code.tokens.input + claude_code.cache.read.tokens)| Field | Cardinality | Recommendation |
|---|---|---|
prompt.id | Unbounded | Use in logs/events only, never as metric dimension |
session.id | Unbounded | Use in logs/events only; keep OTEL_METRICS_INCLUDE_SESSION_ID=false |
user.id | Bounded by team size | Acceptable as metric dimension for small teams (<1000 users); use logs for larger orgs |
model | Low (~5–20 values) | Safe as metric dimension |
gen_ai.system | Low (~10 values) | Safe as metric dimension |
tool.name | Low–Medium | Acceptable as metric dimension if tools are bounded |
Rule of 100: Any attribute with >100 unique values should NOT be a metric dimension. Use logs or traces instead.
| Agent | Default | Opt-in for Content |
|---|---|---|
| Claude Code | Prompts redacted | OTEL_LOG_USER_PROMPTS=true |
| Codex CLI | Prompts redacted | log_user_prompt = true in config.toml |
| GitHub Copilot | Content not captured | captureContent: true in settings |
| Gemini CLI | Prompts not logged | logPrompts: true in settings.json |
⚠️ Production Warning: Never enable prompt capture in shared or production environments without explicit PII controls. User prompts frequently contain secrets, credentials, and personal data.
Add to your collector config to redact secrets from tool parameters before they reach backends:
transform/redact_agent_secrets:
log_statements:
- context: log
statements:
# Redact API keys and tokens from tool parameters
- replace_pattern(attributes["tool.parameters"], "(?i)(api[_-]?key|secret|token|password|bearer)[\"'\\s]*[:=][\"'\\s]*[^\\s,}\"']+", "${1}=REDACTED")
# Redact AWS credentials
- replace_pattern(attributes["tool.parameters"], "AKIA[0-9A-Z]{16}", "REDACTED_AWS_KEY")
# Redact connection strings
- replace_pattern(attributes["tool.parameters"], "(postgresql|mysql|mongodb)://[^@]+@", "${1}://REDACTED@")See references/security.md for comprehensive OTTL redaction patterns.
Gap: Claude Code emits metrics and logs/events, but no distributed traces. There is no W3C traceparent propagation.
Workaround — Pseudo-trace via prompt.id correlation:
prompt.id = "prompt_abc123"
Log events sharing this prompt.id form a "trace":
→ gen_ai.user.message (prompt.id=prompt_abc123)
→ claude_code.api.request (prompt.id=prompt_abc123)
→ gen_ai.tool.message (prompt.id=prompt_abc123, tool.name=bash)
→ gen_ai.assistant.message (prompt.id=prompt_abc123)Query in Loki/OpenSearch: {job="claude_code"} | json | prompt_id="prompt_abc123" to reconstruct a session's event timeline.
Gap: codex exec (non-interactive batch mode) drops all metrics. codex mcp-server has zero OTel instrumentation.
Status: Open issue — github.com/openai/codex/issues/12913
Workaround: Use interactive codex mode for telemetry. For codex exec pipelines, instrument the calling shell script with timing/exit code metrics via a Prometheus Pushgateway or write structured JSON logs that a filelog receiver can ingest.
Gap: Alibaba has published telemetry documentation but the implementation code has not shipped as of 2026-03.
Action: Watch the Qwen Code changelog and the repo for the enabling commit. Do not build infrastructure dependencies on Qwen Code telemetry until code ships.
Gap: These agents emit no OTLP data. Native instrumentation is absent and no roadmap items are public.
Workaround: Use opentelemetry-hooks to wrap the agent process. This provides a practical primary instrumentation path for unsupported agents and the same outer governance/control wrapper recommended elsewhere in this guide. It emits process-level spans, metrics, and logs without requiring changes to the agent binary. See §2.7 for setup and usage guidance.
⚠️ opentelemetry-hooks captures process-level signals only (invocation duration, exit code, stdout/stderr). It complements native telemetry, but it cannot observe LLM token usage, model names, or tool calls made inside the agent. For full GenAI observability, advocate for native instrumentation via the agents' issue trackers.
Gap: No W3C traceparent propagation exists between AI coding agents. If Claude Code calls a tool that triggers Gemini CLI (or vice versa via MCP), there is no automatic trace linkage.
Workaround: Use a shared session.id or custom correlation attribute passed as metadata to link events across agents in log queries. True distributed tracing across agents is not possible today.
⚠️ Breaking Change in Semantic Conventions v1.41.0: The gen-ai conventions now require that tool call spans include the tool name for proper span naming. This affects agents using the gen_ai.* namespace for tool execution spans. Ensure your instrumentation includes the tool name when creating spans for AI agent tool calls.
| Agent | Uses gen_ai.* | Custom Prefix | Notes |
|---|---|---|---|
| Gemini CLI | ✅ Full | — | Follows gen_ai.* v1.40.0+ |
| GitHub Copilot | ✅ Full | — | Follows gen_ai.* v1.40.0+ |
| Claude Code | ❌ | claude_code.* | Uses OTTL transform to map (see §3) |
| Codex CLI | ❌ | codex.* | Custom event names, partial coverage |
| Qwen Code | 🔜 planned | .qwen.* | Not yet verifiable |
Use the transform/normalize_agent_metrics processor from §3 to add gen_ai.system attributes to Claude Code and Codex telemetry for unified dashboard queries.
For dashboards and alerting, treat gen_ai.token.type as an open set. Keep normalizations additive (for example, mapping vendor-specific cache counters into a shared label) instead of rewriting unfamiliar values away.
OpenTelemetry upstream is discussing new semantic conventions for AI agent identity/trust and AI sandbox execution (semantic-conventions#3582, semantic-conventions#3583). These are proposals only; this skill should not present agent.* or sandbox.* as stable OpenTelemetry fields yet.
There is also an active proposal for a dedicated skill span concept (semantic-conventions#3540). Do not assume gen_ai.skill.* naming is finalized; keep skill/tool execution modeling behind collector transforms or dashboard aliasing until conventions stabilize.
Current guidance until conventions stabilize:
gen_ai.*, core resource attributes, and vendor-specific fields that already exist.company.agent.id, company.agent.trust_level, company.sandbox.runtime) rather than betting on proposed upstream names.When these proposals become an OTEP or merge into the semantic conventions repository, update collector transforms and dashboard examples deliberately rather than bulk-renaming attributes prematurely.
docs
evals
cardinality-protection
claude-code-telemetry
collector-memory-limiter
scenario-1
scenario-2
scenario-3
scenario-4
tail-sampling-setup
references