Skip to content

awsprometheusremotewrite exporter logs - exemplar missing labels #959

@gautam-nutalapati

Description

@gautam-nutalapati

Describe the question
Why does awsprometheusremotewrite exporter in aws-otel-collector throw below error:

"error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1643818281979 series: {__name__=\"http_client_duration_bucket\", http_flavor=\"1.1\", http_method=\"GET\", http_status_code=\"200\", http_url=\"http://169.254.170.2/v2/credentials/5f993586-e2c0-4a1d-91d0-e48ba719e22a\", le=\"5\"} la"

Steps to reproduce if your question is related to an action

  • Run web application instrumented with histogram and aws otel java agent 1.10.0
    • Set OTEL_TRACES_EXPORTER and OTEL_METRICS_EXPORTER env vars to otel
  • Run aws otel collector sidecar v1.16.0 with awsprometheusremotewrite
  • After making a request, we can see aws otel collector printing a metric like http_client_duration_bucket which is a histogram and above error in colelctor logs.
  • Also, the aws-otel-collector metrics endpoint, I see:
  # HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
  # TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="awsprometheusremotewrite",service_instance_id="68d4a161-aa29-4a1e-87df-26cd719c62d6",service_version="latest"} 1086

What did you expect to see?
As I do see metrics going to Grafana and traces to X-Ray, I expect no error message is printed.

Environment
NA

Additional context
It seems like this error is thrown by prometheus when trace information is not tied to metrics.
e.g. metric with exemplar information from link:
my_histogram_bucket{le="0.5"} 205 # {TraceID="b94cc547624c3062e17d743db422210e"} 0.175XXX 1.6XXX

Can this error be ignored? Or am I missing any configuration which is causing this error. I cannot find much info online about this.
I don't need trace to be tied to metric.

OTEL-Collector configuration:

extensions:
  health_check:
  pprof:
    endpoint: :1777
  zpages:
    endpoint: :55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
  memory_limiter:
    check_interval: 1s
    limit_percentage: 50
    spike_limit_percentage: 30
  batch/traces:
    timeout: 10s
    send_batch_size: 50
  batch/metrics:
    timeout: 10s

exporters:
  awsxray:
    region: "${AWS_REGION}"
  awsprometheusremotewrite:
    endpoint: "${PROMETHEUS_WRITE_ENDPOINT}"
    aws_auth:
      service: "aps"
      region: "${AWS_REGION}"
  prometheus:
    endpoint: "0.0.0.0:8889"
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch/traces]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      processors: [batch/metrics]
      exporters: [awsprometheusremotewrite]
    # Pipeline to send metrics to local prometheus workspace
    metrics/2:
      receivers: [otlp]
      processors: [batch/metrics]
      exporters: [ prometheus ]

Update:
I have been running the metric forwarded despite this error to test it out more.
This error seems to have issue exporting just one bucket of all buckets.
I configured aws otel collector to forward metrics to both prometheus and prometheusremotewriteexporter.
In prometheus endpoint exposed by aws-otel-collector, I see below data for the histogram:

api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="5"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="10"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="25"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="50"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="75"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="100"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="250"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="500"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="750"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="1000"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="2500"} 43
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="5000"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="7500"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="10000"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="+Inf"} 44
api_latency_sum{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 70168
api_latency_count{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 44

But in grafana, the histogram I plot looks as below,
grafana-missing-bucket
As we can see, AMP is missing a bucket data.
Related error shows data being dropped for this bucket:

2022-02-07T21:33:26.114Z	error	exporterhelper/queued_retry.go:183	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644267527143 series: {__name__=\"api_latency_bucket\", api_method=\"GET\", api_name=\"/users/v1/profiles/me\", env=\"dev-local\", le=\"2500\", status_code=\"500\", svc=\"user-profile-svc\", test=\"gautam\"} labels: {}\n", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78

In addition, below metrics which are published to prometheus but are being dropped when writing to AMP. These are default http metrics generated by aws otel java agent: http_client_duration_bucket and http_server_duration_bucket

2022-02-07T21:38:25.801Z	error	exporterhelper/queued_retry.go:183	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644267527142 series: {__name__=\"http_client_duration_bucket\", env=\"gautam-dev\", http_flavor=\"1.1\", http_method=\"GET\", http_url=\"http://localhost:9900/stux/v1/users/97378103842048256\", le=\"5\", svc=\"user-profile-svc\", tes", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78
2022-02-07T21:39:25.868Z	error	exporterhelper/queued_retry.go:183	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644265641009 series: {__name__=\"http_server_duration_bucket\", env=\"gautam-dev\", http_flavor=\"1.1\", http_host=\"localhost:8060\", http_method=\"GET\", http_scheme=\"http\", http_status_code=\"403\", le=\"750\", svc=\"user-profile-s", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions