-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
With batchsender, in-flight requests that are indefinitely blocking (due to e.g. retries) will block batchsender shutdown and the collector shutdown.
Batchsender shutdown is blocked by https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L220 which in turn blocks queue sender and exporter shutdown in https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/common.go#L330C1-L334C33
This is in contrast to the case without batchsender, where retries are handled by retry_sender (which is bypassed by batchsender since batchsender calls request.Export directly), and on shutdown, retry_sender will avoid retrying and return experr.shutdownErr
if downstream senders return an error.
Steps to reproduce
Write a request with Export
that blocks indefinitely. Trigger collector shutdown. Observe that the collector hangs.
What did you expect to see?
Collector should not hang on shutdown.
In-flight requests should be signaled that the collector is shutting down, possibly return an experr.shutdownErr
in the call chain, so that the collector can exit cleanly without removing the request from persistent queue.
What did you see instead?
Collector hangs as batchsender waits for all active requests to return.
What version did you use?
v0.100.0
What config did you use?
Environment
Additional context
Q:
- Is it intentional to bypass retrysender and timeoutsender in batchsender? It was raised during code review
- Even if batchsender.Shutdown returns without polling for in-flight requests to end, and that
be.ShutdownFunc.Shutdown(ctx))
is called, how to cleanly exit the collector without losing events in persistent queue? experr.shutdownErr is in an internal package, and contrib exporter code does not have a way to avoid queue item removal from persistent queue aside from blocking queueConsume
.