OpenTelemetry Collector deployment, instrumentation (Java/Python/Node.js/.NET/Go), and OTTL pipeline transforms for Coralogix — coralogix exporter config, Helm chart selection, Kubernetes topology, ECS/EKS/GKE deployments, SDK setup, APM transactions, and OTTL cardinality/PII/routing.
98
97%
Does it follow best practices?
Impact
99%
1.13xAverage score across 81 eval scenarios
Advisory
Suggest reviewing before use
Fargate doesn't expose a host to run a daemonset, so the Coralogix pattern is a sidecar collector in every task definition. The app container and the collector share the task's network namespace (awsvpc), so apps send OTLP to localhost:4317.
coralogixrepo/coralogix-otel-collector:<version> (CDOT) or otel/opentelemetry-collector-contrib:<version>. Prefer CDOT because it includes the healthcheck binary referenced below.{
"family": "my-fargate-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"containerDefinitions": [
{
"name": "app",
"image": "my-app:1.2.3",
"essential": true,
"dependsOn": [
{ "containerName": "otel-collector", "condition": "HEALTHY" }
],
"environment": [
{ "name": "OTEL_EXPORTER_OTLP_ENDPOINT", "value": "http://localhost:4317" }
]
},
{
"name": "otel-collector",
"image": "coralogixrepo/coralogix-otel-collector:v0.5.0",
"essential": false,
"command": ["--config", "env:OTEL_CONFIG"],
"environment": [
{ "name": "CORALOGIX_DOMAIN", "value": "eu2.coralogix.com" }
],
"secrets": [
{ "name": "CORALOGIX_PRIVATE_KEY", "valueFrom": "arn:aws:secretsmanager:...:coralogix-private-key" },
{ "name": "OTEL_CONFIG", "valueFrom": "arn:aws:ssm:...:parameter/otel-config" }
],
"healthCheck": {
"command": ["CMD", "/healthcheck"],
"interval": 5,
"timeout": 3,
"retries": 3,
"startPeriod": 10
}
}
]
}The bug: if the app container is essential: true and the OTel sidecar is essential: false, and the app crashes within the first couple seconds, ECS terminates the entire task — including the sidecar — before the collector has finished its ~2–5s startup. Logs buffered by FireLens never drain; traces/metrics emitted right before the crash are lost.
The fix: make the app container depend on the sidecar being healthy before the app starts. That requires (a) the collector to expose a healthcheck and (b) the app's task-definition dependsOn to gate on condition: HEALTHY.
CDOT (coralogixrepo/coralogix-otel-collector) ships with a /healthcheck binary exactly for this — see the task definition above. If a user is on otel/opentelemetry-collector-contrib and doesn't want to switch, they can enable the health_check extension and probe curl -sf localhost:13133 instead:
extensions:
health_check:
endpoint: "0.0.0.0:13133"
service:
extensions: [health_check]"healthCheck": {
"command": ["CMD-SHELL", "curl -sf http://localhost:13133/ || exit 1"],
"interval": 5, "timeout": 3, "retries": 3, "startPeriod": 10
}CDOT and upstream OTel Contrib both honor the env: and envprovider providers in --config. Common Fargate patterns:
OTEL_CONFIG via task secrets. command: ["--config", "env:OTEL_CONFIG"].command: ["--config", "s3://my-bucket.s3.us-east-1.amazonaws.com/config.yaml"] and give the task role s3:GetObject on the key.Execution role:
secretsmanager:GetSecretValue on the CORALOGIX_PRIVATE_KEY secret.ssm:GetParameter(s) on the config parameter when injecting OTEL_CONFIG from Parameter Store via task secrets.Task role:
OTEL_CONFIG.s3://..., grant s3:GetObject on the config object (and s3:GetBucketLocation if the provider needs to resolve the bucket region).awsxray), they need the relevant permissions.For applications that stream stdout/stderr as logs, FireLens routes those logs into the OTel sidecar via awsfirelens:
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"Name": "forward",
"Host": "localhost",
"Port": "24224"
}
}The sidecar runs a fluentforward receiver on port 24224, routing the stream through the OTel pipeline. Log drivers alternative: awslogs → CloudWatch → Coralogix via the Firehose integration (out of scope here).
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
fluentforward:
endpoint: "0.0.0.0:24224"
awsecscontainermetrics:
collection_interval: 60s
processors:
memory_limiter:
check_interval: 2s
limit_mib: 384
resourcedetection:
detectors: [env, ecs, system] # on Fargate, `ecs` IS correct — collector shares task with app
batch:
exporters:
coralogix:
domain: "${env:CORALOGIX_DOMAIN}"
private_key: "${env:CORALOGIX_PRIVATE_KEY}"
application_name_attributes: ["aws.ecs.task.family"]
subsystem_name_attributes: ["aws.ecs.container.name"]
service:
pipelines:
logs:
receivers: [fluentforward, otlp]
processors: [memory_limiter, resourcedetection, batch]
exporters: [coralogix]
metrics:
receivers: [awsecscontainermetrics, otlp]
processors: [memory_limiter, resourcedetection, batch]
exporters: [coralogix]
traces:
receivers: [otlp]
processors: [memory_limiter, resourcedetection, batch]
exporters: [coralogix]Note: unlike ECS EC2 daemonset, ecs detector IS correct here — the collector shares the task with the app, so task metadata maps to the app's container.
Tail sampling needs a full trace view. On Fargate-only users with traces spanning multiple ECS clusters, the sidecar model alone can't deliver complete traces to a single sampler. The documented pattern is a centralized collector on ECS EC2 (or EKS) that all Fargate sidecars forward to via loadbalancing exporter keyed on trace_id:
Fargate sidecar ─► loadbalancing (by trace_id) ─► Central EC2 collector ─► Coralogix
(tail_sampling + spanmetrics here)The same pattern works for pure ECS Fargate by replacing the EKS half with another ECS cluster: sidecar loadbalances by trace_id to a central EC2 collector that owns tail_sampling + spanmetrics.
dependsOn: HEALTHY pattern, app crashes within the first ~5s lose buffered logs/traces.localhost:4317. Because of awsvpc, the sidecar shares the task ENI. Apps must not try to use the task's ENI IP — localhost is correct (and faster).ecs detector IS enabled in Fargate. Don't carry over the "disable ecs detector" rule from setup-ecs-ec2.md — that only applies to daemonset mode.hostmetrics receiver is useless here; rely on awsecscontainermetrics (available on Platform 1.4.0+ via Task Metadata V4) and ecs.task.* metadata via resourcedetection.essential: true, collector essential: false, app dependsOn: HEALTHY on the collector. Prevents the startup race.localhost:4317 — shared awsvpc ENI.--config env:OTEL_CONFIG or --config s3://....ecs detector IS correct in sidecar mode (unlike daemonset).loadbalancing on trace_id.evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
scenario-42
scenario-43
scenario-44
scenario-45
scenario-46
scenario-47
scenario-48
scenario-49
scenario-50
scenario-51
scenario-52
scenario-53
scenario-54
scenario-55
scenario-56
scenario-57
scenario-58
scenario-59
scenario-60
scenario-61
scenario-62
scenario-63
scenario-64
scenario-65
scenario-66
scenario-67
scenario-68
scenario-69
scenario-70
scenario-71
scenario-72
scenario-73
scenario-74
scenario-75
scenario-76
scenario-77
scenario-78
scenario-79
scenario-80
scenario-81
skills
opentelemetry
opentelemetry-collector
references
opentelemetry-instrumentation
opentelemetry-ottl