Skip to content

[exporterhelper] batchsender casues indefinitely blocking in-flight requests to block collector shutdown #10166

@carsonip

Description

@carsonip

Describe the bug

With batchsender, in-flight requests that are indefinitely blocking (due to e.g. retries) will block batchsender shutdown and the collector shutdown.

Batchsender shutdown is blocked by https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L220 which in turn blocks queue sender and exporter shutdown in https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/common.go#L330C1-L334C33

This is in contrast to the case without batchsender, where retries are handled by retry_sender (which is bypassed by batchsender since batchsender calls request.Export directly), and on shutdown, retry_sender will avoid retrying and return experr.shutdownErr if downstream senders return an error.

Steps to reproduce

Write a request with Export that blocks indefinitely. Trigger collector shutdown. Observe that the collector hangs.

What did you expect to see?

Collector should not hang on shutdown.

In-flight requests should be signaled that the collector is shutting down, possibly return an experr.shutdownErr in the call chain, so that the collector can exit cleanly without removing the request from persistent queue.

What did you see instead?

Collector hangs as batchsender waits for all active requests to return.

What version did you use?

v0.100.0

What config did you use?

Environment

Additional context

Q:

  • Is it intentional to bypass retrysender and timeoutsender in batchsender? It was raised during code review
  • Even if batchsender.Shutdown returns without polling for in-flight requests to end, and that be.ShutdownFunc.Shutdown(ctx)) is called, how to cleanly exit the collector without losing events in persistent queue? experr.shutdownErr is in an internal package, and contrib exporter code does not have a way to avoid queue item removal from persistent queue aside from blocking queue Consume.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions