OpenTelemetry Collector deployment, instrumentation (Java/Python/Node.js/.NET/Go), and OTTL pipeline transforms for Coralogix — coralogix exporter config, Helm chart selection, Kubernetes topology, ECS/EKS/GKE deployments, SDK setup, APM transactions, and OTTL cardinality/PII/routing.
92
96%
Does it follow best practices?
Impact
92%
1.10xAverage score across 127 eval scenarios
Advisory
Suggest reviewing before use
Coralogix's APM Service Catalog and Database Catalog are powered by
metrics derived from spans by the OpenTelemetry Span Metrics
connector.
The connector converts trace data into RED metrics (Rate, Errors, Duration)
so APM can retain request / error / latency coverage even when traces are
sampled downstream, as long as spanmetrics consumes 100% of spans before
any sampling decision.
This page lists the labels that must be present on Span Metrics for the APM features to work, plus the cardinality controls that protect the backend from runaway label combinations.
Span Metrics produces four series for each unique label combination:
duration_ms_sum, duration_ms_bucket, calls_total, duration_ms_count.
Keep this minimal label set. For Coralogix Service Catalog and Database
Catalog drilldowns, do not describe these labels as optional, and do not
remove them to reduce cardinality; reduce cardinality by normalizing
unbounded values and by removing non-required dimensions instead.
| Label | Source attribute |
|---|---|
span_name | span name |
service_name | resource service.name |
span_kind | span kind (SERVER, CLIENT, etc.) |
status_code | span status code |
collector_instance_id | connector-generated collector.instance.id dimension |
http_method | span http.method |
host_name | resource host.name |
k8s_cluster_name | resource k8s.cluster.name |
cgx.transaction | span attribute populated by CoralogixTransactionSampler |
cgx.transaction.root | same |
db_namespace | span db.namespace |
db_operation_name | span db.operation.name |
db_collection_name | span db.collection.name |
db_system | span db.system |
application_name | resolved from the Coralogix exporter's application_name_attributes |
cx_subsystem_name | resolved from the Coralogix exporter's subsystem_name_attributes |
For span/resource-derived labels, if a span lacks the source attribute,
the corresponding label is empty — which can collapse APM rows or duplicate
them depending on the feature.
Do not exclude the connector-generated collector.instance.id default
dimension (collector_instance_id after Prometheus normalization) unless
another stable writer-unique dimension is already in place; otherwise
multiple collector agents can emit colliding RED series for the same
service/span/status combination. For self-managed / upstream collectors,
verify that collector.instance.id is actually present in generated
metrics. Upstream contrib added this feature in v0.136.0; in
v0.136.0 through v0.151.x, start the collector with
--feature-gates=+connector.spanmetrics.includeCollectorInstanceID or use
another stable writer-unique dimension. In v0.152.0 and later, upstream
spanmetrics docs describe the feature gate as beta / enabled by default,
but it can still be disabled with
--feature-gates=-connector.spanmetrics.includeCollectorInstanceID or
removed with exclude_dimensions. For builds before v0.136.0, use
another writer-unique dimension or a single-writer pipeline topology.
Short answers for collector-instance safety should name both version
windows: v0.136.0 through v0.151.x needs the feature gate, while
v0.152.0+ should still be verified because collector.instance.id can be
disabled or excluded.
Span Metrics only has full RED coverage when the connector receives every span before sampling. The common Coralogix topology is:
If SDK head sampling drops spans before they leave the application, or a
collector tail_sampling processor runs before spanmetrics, the connector
only sees sampled spans. Say the symptom plainly in support answers: Span
Metrics is then based only on sampled spans, so generated calls_total,
error counts, and duration histograms undercount real traffic and can
distort rates, error percentages, and latency percentiles. This is not full
coverage, even if the sampled traces themselves look healthy.
Collector pipeline placement belongs to the opentelemetry-collector skill;
use its config-connectors.md guidance when a customer asks where to run
spanmetrics versus tail_sampling.
For new native OpenTelemetry HTTP metric data points, use the current stable
attribute http.request.method. Span Metrics is a different generated-metric
surface: the connector dimensions still read legacy http.method, so
Coralogix bridges http.request.method to http.method before the connector
consumes spans.
The connector's dimensions block uses specific source-attribute names
that don't always match what customer instrumentation emits today:
| Connector dimension | Source attribute the connector reads | What modern OTel SDKs emit |
|---|---|---|
http_method label | http.method | http.request.method |
db_namespace label | db.namespace | sometimes db.name; if both are absent, fall back through endpoint / system attributes so the label is not blank |
db_operation_name label | db.operation.name | sometimes db.operation (the name customers see in the Database Monitoring docs) |
db_system label | db.system | db.system.name in stable DB semconv |
The Coralogix Helm chart (coralogix/otel-integration/k8s-helm)
includes a transform/spanmetrics processor that bridges these so the
labels populate regardless of which name the SDK emits. Use the
coralogix/telemetry-shippers chart README as the canonical reference for
the spanMetrics.transformStatements values block
that defines this bridge; the commented defaults live in
otel-integration/k8s-helm/values.yaml.
In Helm values, these statements belong under the top-level
spanMetrics.transformStatements, not under
spanMetrics.dbMetrics.transformStatements. DB-metrics-only placement can
make db_calls_total contain db_namespace while the normal Span Metrics
calls_total series still has a blank db_namespace.
Short-answer anchor for DB namespace incidents: the pre-spanmetrics bridge
belongs in top-level spanMetrics.transformStatements / the traces pipeline
before spanmetrics. It must populate db.namespace from db.name, then
server.address, network.peer.name, net.peer.name,
network.peer.address, and finally db.system. Use the symptom wording
db_calls_total works but normal calls_total still has blank
db_namespace when the bridge lives only under
spanMetrics.dbMetrics.transformStatements.
For Fleet UI or generated values, verify the rendered collector ConfigMap,
not only the UI JSON or values.yaml. The same semantic bridge can be
present in the user input but missing from the actual traces pipeline if the
UI-to-Helm mapping writes a different key such as
span_metrics.transform_statements or a DB-only path.
The public Coralogix naming-conventions guide documents the same
transform/spanmetrics bridge and where it belongs in the traces pipeline.
Self-managed collector users must add the equivalent bridge themselves,
otherwise the labels stay empty and APM Service Catalog / Database Catalog
drilldowns appear blank. Hand off the OTTL transform authoring to the
opentelemetry-ottl skill — minimal example shape:
processors:
transform/spanmetrics_bridge:
error_mode: ignore
trace_statements:
- context: span
statements:
- set(attributes["http.method"], attributes["http.request.method"]) where attributes["http.method"] == nil and attributes["http.request.method"] != nil
- set(attributes["db.system"], attributes["db.system.name"]) where attributes["db.system"] == nil and attributes["db.system.name"] != nil
- set(attributes["db.namespace"], attributes["db.name"]) where attributes["db.namespace"] == nil and attributes["db.name"] != nil
- set(attributes["db.namespace"], attributes["server.address"]) where attributes["db.namespace"] == nil and attributes["server.address"] != nil
- set(attributes["db.namespace"], attributes["network.peer.name"]) where attributes["db.namespace"] == nil and attributes["network.peer.name"] != nil
- set(attributes["db.namespace"], attributes["net.peer.name"]) where attributes["db.namespace"] == nil and attributes["net.peer.name"] != nil
- set(attributes["db.namespace"], attributes["network.peer.address"]) where attributes["db.namespace"] == nil and attributes["network.peer.address"] != nil
- set(attributes["db.namespace"], attributes["db.system"]) where attributes["db.namespace"] == nil and attributes["db.system"] != nil
- set(attributes["db.operation.name"], attributes["db.operation"]) where attributes["db.operation.name"] == nil and attributes["db.operation"] != nil
# Driver-specific collection/table names → db.collection.name.
# Repeat for db.mongodb.collection, db.cassandra.table, etc. as
# needed; full list at https://opentelemetry.io/docs/specs/semconv/registry/attributes/db/
- set(attributes["db.collection.name"], attributes["db.sql.table"]) where attributes["db.collection.name"] == nil and attributes["db.sql.table"] != nilBridge runs in the traces pipeline before spanmetrics consumes
the spans. Hand SDK-specific fallback chains off to the
opentelemetry-ottl skill. If the same customer is debugging the
Database Catalog, also apply the Database Monitoring bridge for
db.namespace → db.name, db.operation.name → db.operation,
db.query.text → db.statement, and server.address /
network.peer.address → net.peer.name; see
database-monitoring.md.
The upstream Span Metrics connector has a feature gate for the latest OTel
status-code semantic convention on generated RED metrics:
spanmetrics.statusCodeConvention.useOtelPrefix.
When the gate is disabled (upstream default at introduction), generated
OTLP metric data points use the older status.code attribute with raw values
Error, Ok, and Unset. When the gate is enabled, the generated
metric data points use otel.status_code only for error and explicit-ok spans,
with raw values ERROR and OK. Unset spans omit otel.status_code; there is
no otel.status_code="UNSET" value to match in collector-side OTTL.
Short-answer anchor: this is a generated Span Metrics / RED metric
data-point attribute change, so a compatibility bridge belongs in the
metrics pipeline after spanmetrics, not in trace
spanMetrics.transformStatements.
Prometheus-facing labels may show normalized values such as
STATUS_CODE_ERROR or STATUS_CODE_OK, or may omit the status label for
unset datapoints. An OTTL transform that runs in the collector metrics pipeline
after spanmetrics sees the raw OTLP datapoint values and attribute absence,
not Prometheus label renderings. Match those raw values and the missing-attribute
case in collector-side bridges.
This is different from the trace-span bridge above. The connector has
already converted spans into metrics, so any compatibility bridge belongs in
the metrics pipeline after spanmetrics, not in the traces pipeline
before it.
If a Coralogix backend path, dashboard, or alert still expects the older metric attribute, mirror the new shape back to the old one temporarily:
processors:
transform/spanmetrics_status_code_compat:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["status.code"], "Error") where attributes["status.code"] == nil and attributes["otel.status_code"] == "ERROR"
- set(attributes["status.code"], "Ok") where attributes["status.code"] == nil and attributes["otel.status_code"] == "OK"
# Scope the nil-status fallback to generated spanmetrics metrics so
# unrelated metrics without otel.status_code do not get status.code.
# This covers the raw OTLP metric names (`calls`, `duration`) and
# the exposed spanmetrics series (`calls_total`, `duration_ms_*`).
- set(attributes["status.code"], "Unset") where attributes["status.code"] == nil and attributes["otel.status_code"] == nil and IsMatch(metric.name, "(^|[._])calls(_total)?$|(^|[._])duration(_ms_(sum|bucket|count))?$") and (attributes["span.name"] != nil or attributes["span_name"] != nil) and (attributes["span.kind"] != nil or attributes["span_kind"] != nil)
service:
pipelines:
metrics:
receivers: [spanmetrics]
processors: [transform/spanmetrics_status_code_compat, batch]
exporters: [coralogix]If Coralogix has already moved to the newer shape and a customer still emits the old one, mirror in the other direction:
processors:
transform/spanmetrics_status_code_to_otel:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["otel.status_code"], "ERROR") where attributes["otel.status_code"] == nil and attributes["status.code"] == "Error"
- set(attributes["otel.status_code"], "OK") where attributes["otel.status_code"] == nil and attributes["status.code"] == "Ok"
# Do not set otel.status_code for status.code == "Unset"; the newer
# convention represents unset by omitting otel.status_code.Do not treat this as an instrumentation downgrade request. First check which feature-gate state the collector is using, then make the Coralogix consumer support both forms or apply a temporary metrics-pipeline bridge.
Add these to the connector's dimensions to enable the corresponding APM
feature:
| Add label | Enables |
|---|---|
rpc.grpc.status_code, http.response.status_code | API-error tracking |
service.version | service grouping by version |
Owner-specific workload label: k8s.deployment.name, k8s.statefulset.name, k8s.daemonset.name, k8s.job.name, k8s.cronjob.name, or k8s.replicaset.name | service ↔ k8s correlation in the Resources Catalog; use whichever attribute matches the pod owner, do not invent k8s.deployment.name for non-Deployment workloads |
k8s.pod.name | avoiding collisions when multiple collector agents / replicas write span metrics for the same service_name, including the common one-agent-per-node DaemonSet topology when a service has pods on multiple nodes. This is high-cardinality / pod-churn-sensitive; use it only for the collision case, not as a blanket default. |
aggregation_cardinality_limit)Dynamic values in span names or labels — UUIDs, full URL paths with
parameters, raw SQL, session IDs — explode the number of unique series.
The Span Metrics connector can enforce a per-service-per-metric cap and
redirect overflow to a fallback series. Raw OTLP metric datapoints carry
the attribute otel.metric.overflow="true"; PromQL / Prometheus-normalized
labels show the same marker as otel_metric_overflow="true".
The default depends on how the connector is deployed:
coralogix/otel-integration/k8s-helm) —
starting with chart v0.0.203, ships with the cap pre-configured at
100,000 via the spanMetrics.aggregationCardinalityLimit preset.
Overflow protection is on by default for those chart versions. For chart
versions before v0.0.203, set spanMetrics.aggregationCardinalityLimit
explicitly.spanmetrics connector ships with aggregation_cardinality_limit: 0,
which disables the cap. Overflow protection only kicks in when the
field is set explicitly.If you are running your own collector, set the limit explicitly:
connectors:
spanmetrics:
aggregation_cardinality_limit: 100000 # 0 (upstream default) disables the capFor the Coralogix Helm chart, the equivalent preset key (already set to
100,000 starting with chart v0.0.203; set it explicitly on older chart
versions):
spanMetrics:
aggregationCardinalityLimit: 100000 # set to 0 to disableWhen overflow happens, additional unique label combinations are aggregated into the fallback series. Example with a limit of 3:
calls_total{service_name="A", span_name="uuid1"}
calls_total{service_name="A", span_name="uuid2"}
calls_total{service_name="A", span_name="uuid3"}
calls_total{service_name="A", otel_metric_overflow="true"} # PromQL / Prometheus-normalized labelBefore Prometheus normalization, the same fallback data point has raw OTLP attributes like:
service.name = "A"
otel.metric.overflow = "true"The cardinality cache:
metrics_expiration: 5m), resets
automatically after 5 minutes of no data for that service. The
upstream spanmetrics connector defaults to metrics_expiration: 0
(no expiration) — self-managed users must set the field explicitly
or restart the collector to clear cardinality state.Use the Prometheus-normalized label form in PromQL:
sum by (service_name) (duration_ms_bucket{otel_metric_overflow="true"}) > 0In the collector debug exporter, OTTL processors, or OTLP metrics before
Prometheus normalization, look for the raw data-point attribute
otel.metric.overflow="true" instead. Either form means a single service
started overflowing before the volume became significant.
| Knob | What it does |
|---|---|
metrics_expiration | how long unused metric series stay in memory before being dropped (helm default: 5m) |
aggregation_temporality | cumulative (totals since start) vs delta (changes since last interval) |
exemplars.enabled | attaches trace IDs to metric points for metric-to-trace jumps; not required for RED monitoring |
resource_metrics_key_attributes | resource attributes used to build the metric resource hash key. Include something that distinguishes the writer — service.instance.id is the canonical incoming resource attribute — when multiple collectors share a service.name, otherwise their counters collide. (collector.instance.id is generated by the connector as a metric dimension, not an incoming resource attribute, so it can't be referenced here unless you've explicitly set it as a resource attribute upstream.) |
Coralogix Helm and customer configs can expose both global collection intervals and connector-specific flush intervals.
| Setting | Meaning |
|---|---|
global.collectionInterval / spanMetrics.collectionInterval | Helm preset-level interval used by Coralogix chart rendering. |
metrics_flush_interval | Upstream connector setting controlling how often Span Metrics emits generated metrics. |
metrics_expiration | How long unused series stay in memory before expiration. |
When a customer is reducing volume or debugging single-writer/cardinality behavior, use the setting that actually renders into the connector. Do not assume a key named like an upstream field exists in the Helm values schema.
k8s.pod.name as a span-metric data-point dimension — explodes
proportional to pod churn. Keep it on the resource (Infrastructure
Explorer needs it there) and include it in the connector's dimensions
only when it prevents cross-writer collisions, such as one collector
agent per Kubernetes node writing span metrics for pods of the same
service on different nodes.url.path (with parameters) as a metric label — use
http.route (the templated form) instead.url.full or full query strings as a metric label — safe for trace
detail, not for RED metric dimensions.user.id / session IDs — span scope only. They explode metrics.APM and Database Catalog query generated labels, not raw span attributes. When a product tab is empty, inspect both forms:
spanmetrics;spanmetrics;Examples:
http.response.status_code can become http_response_status_code.db.system can become db_system.service.name can become service_name.otel.metric.overflow becomes otel_metric_overflow.If the raw attribute exists but the generated label is blank, the bridge is probably in the wrong path or the connector dimensions do not include that source.
.claude-plugin
.codex-plugin
.cursor-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
scenario-45
scenario-46
scenario-47
scenario-48
scenario-49
scenario-50
scenario-51
scenario-52
scenario-53
scenario-54
scenario-55
scenario-56
scenario-57
scenario-58
scenario-59
scenario-60
scenario-61
scenario-62
scenario-63
scenario-64
scenario-65
scenario-66
scenario-67
scenario-68
scenario-69
scenario-70
scenario-71
scenario-72
scenario-73
scenario-74
scenario-75
scenario-76
scenario-77
scenario-78
scenario-79
scenario-80
scenario-81
scenario-82
scenario-83
scenario-84
scenario-85
scenario-86
scenario-87
scenario-88
scenario-89
scenario-90
scenario-91
scenario-92
scenario-93
scenario-94
scenario-95
scenario-96
scenario-97
scenario-98
scenario-99
scenario-100
scenario-101
scenario-102
scenario-103
scenario-104
scenario-105
scenario-106
scenario-107
scenario-108
scenario-109
scenario-110
scenario-111
scenario-112
scenario-113
scenario-114
scenario-115
scenario-116
scenario-117
scenario-118
scenario-119
scenario-120
scenario-121
scenario-122
scenario-123
scenario-124
scenario-125
scenario-126
scenario-127
skills
opentelemetry
opentelemetry-collector
references
opentelemetry-instrumentation
opentelemetry-ottl