-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Describe the question
Why does awsprometheusremotewrite exporter in aws-otel-collector throw below error:
"error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1643818281979 series: {__name__=\"http_client_duration_bucket\", http_flavor=\"1.1\", http_method=\"GET\", http_status_code=\"200\", http_url=\"http://169.254.170.2/v2/credentials/5f993586-e2c0-4a1d-91d0-e48ba719e22a\", le=\"5\"} la"
Steps to reproduce if your question is related to an action
- Run web application instrumented with histogram and aws otel java agent 1.10.0
- Set OTEL_TRACES_EXPORTER and OTEL_METRICS_EXPORTER env vars to
otel
- Set OTEL_TRACES_EXPORTER and OTEL_METRICS_EXPORTER env vars to
- Run aws otel collector sidecar v1.16.0 with
awsprometheusremotewrite
- After making a request, we can see aws otel collector printing a metric like
http_client_duration_bucket
which is a histogram and above error in colelctor logs. - Also, the aws-otel-collector metrics endpoint, I see:
# HELP otelcol_exporter_send_failed_metric_points Number of metric points in failed attempts to send to destination.
# TYPE otelcol_exporter_send_failed_metric_points counter
otelcol_exporter_send_failed_metric_points{exporter="awsprometheusremotewrite",service_instance_id="68d4a161-aa29-4a1e-87df-26cd719c62d6",service_version="latest"} 1086
What did you expect to see?
As I do see metrics going to Grafana and traces to X-Ray, I expect no error message is printed.
Environment
NA
Additional context
It seems like this error is thrown by prometheus when trace information is not tied to metrics.
e.g. metric with exemplar information from link:
my_histogram_bucket{le="0.5"} 205 # {TraceID="b94cc547624c3062e17d743db422210e"} 0.175XXX 1.6XXX
Can this error be ignored? Or am I missing any configuration which is causing this error. I cannot find much info online about this.
I don't need trace to be tied to metric.
OTEL-Collector configuration:
extensions:
health_check:
pprof:
endpoint: :1777
zpages:
endpoint: :55679
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
# https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
memory_limiter:
check_interval: 1s
limit_percentage: 50
spike_limit_percentage: 30
batch/traces:
timeout: 10s
send_batch_size: 50
batch/metrics:
timeout: 10s
exporters:
awsxray:
region: "${AWS_REGION}"
awsprometheusremotewrite:
endpoint: "${PROMETHEUS_WRITE_ENDPOINT}"
aws_auth:
service: "aps"
region: "${AWS_REGION}"
prometheus:
endpoint: "0.0.0.0:8889"
service:
extensions: [pprof, zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [batch/traces]
exporters: [awsxray]
metrics:
receivers: [otlp]
processors: [batch/metrics]
exporters: [awsprometheusremotewrite]
# Pipeline to send metrics to local prometheus workspace
metrics/2:
receivers: [otlp]
processors: [batch/metrics]
exporters: [ prometheus ]
Update:
I have been running the metric forwarded despite this error to test it out more.
This error seems to have issue exporting just one bucket of all buckets.
I configured aws otel collector to forward metrics to both prometheus and prometheusremotewriteexporter.
In prometheus endpoint exposed by aws-otel-collector, I see below data for the histogram:
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="5"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="10"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="25"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="50"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="75"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="100"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="250"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="500"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="750"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="1000"} 0
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="2500"} 43
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="5000"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="7500"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="10000"} 44
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="+Inf"} 44
api_latency_sum{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 70168
api_latency_count{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 44
But in grafana, the histogram I plot looks as below,
As we can see, AMP is missing a bucket data.
Related error shows data being dropped for this bucket:
2022-02-07T21:33:26.114Z error exporterhelper/queued_retry.go:183 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644267527143 series: {__name__=\"api_latency_bucket\", api_method=\"GET\", api_name=\"/users/v1/profiles/me\", env=\"dev-local\", le=\"2500\", status_code=\"500\", svc=\"user-profile-svc\", test=\"gautam\"} labels: {}\n", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78
In addition, below metrics which are published to prometheus but are being dropped when writing to AMP. These are default http metrics generated by aws otel java agent: http_client_duration_bucket
and http_server_duration_bucket
2022-02-07T21:38:25.801Z error exporterhelper/queued_retry.go:183 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644267527142 series: {__name__=\"http_client_duration_bucket\", env=\"gautam-dev\", http_flavor=\"1.1\", http_method=\"GET\", http_url=\"http://localhost:9900/stux/v1/users/97378103842048256\", le=\"5\", svc=\"user-profile-svc\", tes", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78
2022-02-07T21:39:25.868Z error exporterhelper/queued_retry.go:183 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "name": "awsprometheusremotewrite", "error": "Permanent error: remote write returned HTTP status 400 Bad Request; err = <nil>: exemplar missing labels, timestamp: 1644265641009 series: {__name__=\"http_server_duration_bucket\", env=\"gautam-dev\", http_flavor=\"1.1\", http_host=\"localhost:8060\", http_method=\"GET\", http_scheme=\"http\", http_status_code=\"403\", le=\"750\", svc=\"user-profile-s", "dropped_items": 39}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:183
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:78