Skip to content

Exporterhelper truncating logs when prometheus remote writer fails writing metrics #39703

@rancorzinho

Description

@rancorzinho

Component(s)

exporter/prometheus, exporter/prometheusremotewrite

What happened?

Description

When prometheusremotewriter fails to write metrics, logs from exporterhelper are truncated. This is making difficult debugging failure cases because the logs are truncated right before the metric name gets shown.

Steps to Reproduce

Sending out-of-order metrics to Prometheus seems to always result in a truncated log.

Expected Result

Error log from exporterhelper should be complete like so:

error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, series={__name__=\"target_info\", http_scheme=\"http\", instance=\"localhost:8888\", job=\"otel-collector\", net_host_port=\"8888\"}\n", "dropped_items": 27}

Actual Result

What we get is a truncated JSON.

error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, se", "dropped_items": 27}

Collector version

v0.123.0, v0.82.0

Environment information

Environment

OS: "Amazon Linux 2023"
AWS AMI: amazon-eks-node-1.32-v20250403

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc: {}
  prometheus:
    config:
      scrape_configs:
        - job_name: opa
          scrape_interval: 30s
          static_configs:
            - targets:
                - localhost:8181
              labels:
                origin: opa
          metric_relabel_configs:
            - source_labels:
                - __name__
              regex: ^http_request_duration_seconds.*
              action: drop
        - job_name: otel-collector
          scrape_interval: 30s
          static_configs:
            - targets:
                - localhost:8888
exporters:
  prometheusremotewrite:
    endpoint: https://aps-workspaces.us-east-2.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write
    auth:
      authenticator: sigv4auth
    remote_write_queue:
      enabled: true
      queue_size: 100
      num_consumers: 1
    resource_to_telemetry_conversion:
      enabled: true
extensions:
  health_check: {}
  sigv4auth:
    region: us-east-2
processors:
  batch:
    send_batch_max_size: 10000
    timeout: 0s
    send_batch_size: 70
  memory_limiter:
    check_interval: 1s
    limit_percentage: 90
    spike_limit_percentage: 35
  metricstransform:
    transforms:
      - match_type: regexp
        experimental_match_labels:
          origin: opa
        include: ^(opa_)*(.*)
        action: update
        new_name: opa_$${2}
  attributes/insert:
    actions:
      - key: app_name
        value: <APP_NAME>
        action: insert
      - key: k8s_pod_name
        value: ${env:HOSTNAME}
        action: insert
service:
  extensions:
    - health_check
    - sigv4auth
  pipelines:
    metrics:
      receivers:
        - otlp
        - prometheus
      processors:
        - attributes/insert
        - batch
        - memory_limiter
        - metricstransform
      exporters:
        - prometheusremotewrite

Log output

go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:391
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:125
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queued_retry.go:195
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func1
	go.opentelemetry.io/collector/[email protected]/exporterhelper/internal/bounded_memory_queue.go:47
2025-04-10T19:39:35.122Z	error	exporterhelper/queued_retry.go:391	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: Permanent error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): maxFailure (quorum) on a given error family, addr=10.1.REDACTED.REDACTED:9095 state=ACTIVE zone=us-east-2b, rpc error: code = Code(400) desc = user=REDACTED: err: out of order sample. timestamp=2025-04-10T19:39:34.739Z, se", "dropped_items": 27}

Additional context

  • otel/opentelemetry-collector-contrib:0.123.0-arm64 image is mirrored internally to ECR.
  • Prometheus instance is AWS AMP.
  • Collector is running as a application sidecar on a EKS cluster.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions