-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Component(s)
exporter/prometheus, exporter/prometheusremotewrite
What happened?
Description
When prometheusremotewriter
fails to write metrics, logs from exporterhelper
are truncated. This is making difficult debugging failure cases because the logs are truncated right before the metric name gets shown.
Steps to Reproduce
Sending out-of-order metrics to Prometheus seems to always result in a truncated log.
Expected Result
Error log from exporterhelper
should be complete like so:
error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, series={__name__=\"target_info\", http_scheme=\"http\", instance=\"localhost:8888\", job=\"otel-collector\", net_host_port=\"8888\"}\n", "dropped_items": 27}
Actual Result
What we get is a truncated JSON.
error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, se", "dropped_items": 27}
Collector version
v0.123.0, v0.82.0
Environment information
Environment
OS: "Amazon Linux 2023"
AWS AMI: amazon-eks-node-1.32-v20250403
OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc: {}
prometheus:
config:
scrape_configs:
- job_name: opa
scrape_interval: 30s
static_configs:
- targets:
- localhost:8181
labels:
origin: opa
metric_relabel_configs:
- source_labels:
- __name__
regex: ^http_request_duration_seconds.*
action: drop
- job_name: otel-collector
scrape_interval: 30s
static_configs:
- targets:
- localhost:8888
exporters:
prometheusremotewrite:
endpoint: https://aps-workspaces.us-east-2.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write
auth:
authenticator: sigv4auth
remote_write_queue:
enabled: true
queue_size: 100
num_consumers: 1
resource_to_telemetry_conversion:
enabled: true
extensions:
health_check: {}
sigv4auth:
region: us-east-2
processors:
batch:
send_batch_max_size: 10000
timeout: 0s
send_batch_size: 70
memory_limiter:
check_interval: 1s
limit_percentage: 90
spike_limit_percentage: 35
metricstransform:
transforms:
- match_type: regexp
experimental_match_labels:
origin: opa
include: ^(opa_)*(.*)
action: update
new_name: opa_$${2}
attributes/insert:
actions:
- key: app_name
value: <APP_NAME>
action: insert
- key: k8s_pod_name
value: ${env:HOSTNAME}
action: insert
service:
extensions:
- health_check
- sigv4auth
pipelines:
metrics:
receivers:
- otlp
- prometheus
processors:
- attributes/insert
- batch
- memory_limiter
- metricstransform
exporters:
- prometheusremotewrite
Log output
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:391
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:125
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:195
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:47
2025-04-10T19:39:35.122Z error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, se", "dropped_items": 27}
Additional context
otel/opentelemetry-collector-contrib:0.123.0-arm64
image is mirrored internally to ECR.- Prometheus instance is AWS AMP.
- Collector is running as a application sidecar on a EKS cluster.