CtrlK
BlogDocsLog inGet started
Tessl Logo

dash0/agent-skills

Expert guidance for configuring and deploying the OpenTelemetry Collector. Use when setting up a Collector pipeline, configuring receivers, exporters, or processors, deploying a Collector to Kubernetes or Docker, or forwarding telemetry to Dash0. Triggers on requests involving collector, pipeline, OTLP receiver, exporter, or Dash0 collector setup.

100

Quality

100%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

red-metrics.mdskills/otel-collector/rules/

title:
RED metrics from traces
impact:
HIGH
tags:
red-metrics, signaltometrics, connectors, histograms, sampling, semantic-conventions

RED metrics from traces

RED metrics (request rate, error rate, duration distributions) derived from traces provide accurate service-level indicators without application-code changes. Use the signaltometricsconnector to materialize these metrics in the Collector with metric names and attributes that match the OpenTelemetry semantic conventions.

Choosing a connector

ConnectorSemconv alignmentConfigurationUse when
signaltometricsconnectorExact — metric names, units, and attributes match the semantic conventionsOTTL expressions define each metric explicitlySemconv compliance matters (recommended)
spanmetricsconnectorPartial — attributes can align, but metric names are generic ({namespace}.duration, {namespace}.calls)Minimal — RED metrics are built inYou need a quick setup and do not require semconv-compliant metric names

Use signaltometricsconnector for production deployments where dashboards, alerts, and SLOs rely on standard metric names. Use spanmetricsconnector only for prototyping or environments where semconv metric names are not required.

Why materialization matters

Accurate RED metrics cannot be computed from sampled traces. Head sampling underestimates counts and skews histograms. Tail sampling overrepresents errors and slow requests.

Generate metrics from all spans before any sampling occurs. Place the metrics connector upstream of the sampling processor so it sees every span.

Metric definitions

The HTTP metric semantic conventions and RPC metric semantic conventions define four duration histograms. Each metric targets a specific combination of protocol and span kind.

Metric nameSignalCondition
http.server.request.durationHTTP server spanskind == SPAN_KIND_SERVER and http.request.method present
http.client.request.durationHTTP client spanskind == SPAN_KIND_CLIENT and http.request.method present
rpc.server.call.durationRPC server spanskind == SPAN_KIND_SERVER and rpc.system.name present
rpc.client.call.durationRPC client spanskind == SPAN_KIND_CLIENT and rpc.system.name present

All four metrics are histograms measured in seconds (s).

Resource attributes

The include_resource_attributes setting controls which resource attributes are attached to the generated metrics. Include the required and recommended resource attributes so that generated metrics can be filtered by service, version, environment, and Kubernetes context.

AttributeWhy
service.nameIdentifies the service producing the metric
service.versionEnables version-aware dashboards and regression detection
service.namespaceScopes services within the same product
service.instance.idDistinguishes individual instances
deployment.environment.nameSeparates production from staging and development
k8s.namespace.nameKubernetes namespace (set by k8sattributes processor)
k8s.deployment.nameKubernetes workload name (set by k8sattributes processor)

HTTP metrics configuration

connectors:
  signaltometrics:
    spans:
      # HTTP server duration — https://opentelemetry.io/docs/specs/semconv/http/http-metrics/
      - name: http.server.request.duration
        description: "Duration of HTTP server requests."
        unit: s
        conditions:
          - kind == SPAN_KIND_SERVER and attributes["http.request.method"] != nil
        attributes:
          - key: http.request.method
          - key: http.response.status_code
            optional: true
          - key: http.route
            optional: true
          - key: url.scheme
            optional: true
          - key: error.type
            optional: true
          - key: network.protocol.version
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: service.namespace
            optional: true
          - key: service.instance.id
            optional: true
          - key: deployment.environment.name
            optional: true
          - key: k8s.namespace.name
            optional: true
          - key: k8s.deployment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

      # HTTP client duration — https://opentelemetry.io/docs/specs/semconv/http/http-metrics/
      - name: http.client.request.duration
        description: "Duration of HTTP client requests."
        unit: s
        conditions:
          - kind == SPAN_KIND_CLIENT and attributes["http.request.method"] != nil
        attributes:
          - key: http.request.method
          - key: http.response.status_code
            optional: true
          - key: error.type
            optional: true
          - key: server.address
            optional: true
          - key: server.port
            optional: true
          - key: network.protocol.version
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: service.namespace
            optional: true
          - key: service.instance.id
            optional: true
          - key: deployment.environment.name
            optional: true
          - key: k8s.namespace.name
            optional: true
          - key: k8s.deployment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

RPC metrics configuration

connectors:
  signaltometrics:
    spans:
      # RPC server duration — https://opentelemetry.io/docs/specs/semconv/rpc/rpc-metrics/
      - name: rpc.server.call.duration
        description: "Duration of inbound RPC calls."
        unit: s
        conditions:
          - kind == SPAN_KIND_SERVER and attributes["rpc.system.name"] != nil
        attributes:
          - key: rpc.system.name
          - key: rpc.method
            optional: true
          - key: rpc.response.status_code
            optional: true
          - key: error.type
            optional: true
          - key: server.address
            optional: true
          - key: server.port
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: service.namespace
            optional: true
          - key: service.instance.id
            optional: true
          - key: deployment.environment.name
            optional: true
          - key: k8s.namespace.name
            optional: true
          - key: k8s.deployment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

      # RPC client duration — https://opentelemetry.io/docs/specs/semconv/rpc/rpc-metrics/
      - name: rpc.client.call.duration
        description: "Duration of outbound RPC calls."
        unit: s
        conditions:
          - kind == SPAN_KIND_CLIENT and attributes["rpc.system.name"] != nil
        attributes:
          - key: rpc.system.name
          - key: rpc.method
            optional: true
          - key: rpc.response.status_code
            optional: true
          - key: error.type
            optional: true
          - key: server.address
            optional: true
          - key: server.port
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: service.namespace
            optional: true
          - key: service.instance.id
            optional: true
          - key: deployment.environment.name
            optional: true
          - key: k8s.namespace.name
            optional: true
          - key: k8s.deployment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

Explicit-bucket fallback

Use explicit-bucket histograms only when the backend does not support exponential histograms (e.g., when exporting to Prometheus via prometheusremotewrite). Replace the exponential_histogram block with a histogram block using the bucket boundaries from the semantic conventions:

histogram:
          buckets: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]
          value: Seconds(end_time - start_time)
          count: "1"

Pipeline wiring

The signaltometrics connector acts as an exporter in the traces pipeline and a receiver in a metrics pipeline. It sees every span before the backend receives the (possibly sampled) subset, producing accurate counts and duration histograms.

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [signaltometrics, otlp]
    metrics/red:
      receivers: [signaltometrics]
      processors: [memory_limiter]
      exporters: [otlp]

List the signaltometrics connector before the backend exporter in the traces pipeline's exporters list. The connector receives a copy of every span independently of the backend exporter.

Full configuration with tail sampling

When combining RED metric materialization with tail sampling, the gateway must run both the signaltometricsconnector and the tailsamplingprocessor. The connector sees all spans; only afterward does the tailsamplingprocessor discard unsampled traces.

connectors:
  signaltometrics:
    spans:
      - name: http.server.request.duration
        description: "Duration of HTTP server requests."
        unit: s
        conditions:
          - kind == SPAN_KIND_SERVER and attributes["http.request.method"] != nil
        attributes:
          - key: http.request.method
          - key: http.response.status_code
            optional: true
          - key: http.route
            optional: true
          - key: url.scheme
            optional: true
          - key: error.type
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: deployment.environment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

      - name: http.client.request.duration
        description: "Duration of HTTP client requests."
        unit: s
        conditions:
          - kind == SPAN_KIND_CLIENT and attributes["http.request.method"] != nil
        attributes:
          - key: http.request.method
          - key: http.response.status_code
            optional: true
          - key: error.type
            optional: true
          - key: server.address
            optional: true
          - key: server.port
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: deployment.environment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

      - name: rpc.server.call.duration
        description: "Duration of inbound RPC calls."
        unit: s
        conditions:
          - kind == SPAN_KIND_SERVER and attributes["rpc.system.name"] != nil
        attributes:
          - key: rpc.system.name
          - key: rpc.method
            optional: true
          - key: rpc.response.status_code
            optional: true
          - key: error.type
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: deployment.environment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

      - name: rpc.client.call.duration
        description: "Duration of outbound RPC calls."
        unit: s
        conditions:
          - kind == SPAN_KIND_CLIENT and attributes["rpc.system.name"] != nil
        attributes:
          - key: rpc.system.name
          - key: rpc.method
            optional: true
          - key: rpc.response.status_code
            optional: true
          - key: error.type
            optional: true
          - key: server.address
            optional: true
          - key: server.port
            optional: true
        include_resource_attributes:
          - key: service.name
          - key: service.version
            optional: true
          - key: deployment.environment.name
            optional: true
        exponential_histogram:
          max_size: 160
          value: Seconds(end_time - start_time)
          count: "1"

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 1638
    spike_limit_mib: 400
  tail_sampling:
    decision_wait: 30s
    num_traces: 100000
    policies:
      - name: errors
        type: status_code
        status_code:
          status_codes:
            - ERROR
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 1000
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 10

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling]
      exporters: [signaltometrics, otlp]
    metrics/red:
      receivers: [signaltometrics]
      processors: [memory_limiter]
      exporters: [otlp]

Anti-patterns

  • Computing RED metrics from sampled traces. Dashboards and alerts that depend on trace-derived metrics show skewed data. Always generate metrics from spans before the sampling step.
  • Using generic metric names instead of semconv names. Metrics named span.metrics.duration cannot be correlated with SDK-generated metrics like http.server.request.duration. Use the signaltometricsconnector to produce metrics with exact semantic convention names.
  • Missing semconv attributes on generated metrics. Without http.request.method, http.response.status_code, and http.route, the generated metrics cannot be filtered the same way as SDK-produced metrics. Always include the semantic convention attributes for each metric.
  • Adding high-cardinality attributes. Attributes like url.full, http.target, or user IDs produce unbounded cardinality. Only add low-cardinality attributes listed in the semantic conventions.
  • Omitting resource attributes. Without service.version and deployment.environment.name in include_resource_attributes, generated metrics cannot be filtered by version or environment. Include the required and recommended resource attributes.

References

skills

otel-collector

rules

custom-distributions.md

deployment.md

exporters.md

pipelines.md

processors.md

receivers.md

red-metrics.md

sampling.md

SKILL.md

README.md

tile.json