Skip to content

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Jun 20, 2025

Description

The run() goroutine closes the channel in defer.
There are two sceanrios:

  1. Best case scenario: no error encountered while calling ConsumeMetrics.
  2. Worst case scenario: we encounter an error while calling ConsumeMetrics, causing the goroutine to exit early and as a result, close the c.exit chan.

We close the channel, again, in Shutdown and this results in a "send on closed channel" panic in worst case scenario. In best case scenario, no panic is encountered.

This PR adds waitgroup to avoid the panic.

Link to tracking issue

Fixes #40845

Testing

Added a test case

@jade-guiton-dd
Copy link
Contributor

I like the fix, but I'm wondering if it makes sense to return from the goroutine when the next consumer returns an error (code link) in the first place, instead of continueing the loop like the case of a failed conversion 🤔 Doesn't this mean that any error in the metrics pipeline will cause the connector to stop working? @songy23

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Jun 23, 2025

Doesn't this mean that any error in the metrics pipeline will cause the connector to stop working?

Yes @jade-guiton-dd. You're right.
If there's any error in the pipeline, the connector will stop working.
We can just continue like we do here

mx, err = c.translator.StatsToMetrics(stats)
if err != nil {
c.logger.Error("Failed to convert stats to metrics", zap.Error(err))
continue
}

and just log the error.

Let me know if it's good @songy23

Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VihasMakwana LGTM

@songy23 songy23 merged commit 20c2fd5 into open-telemetry:main Jun 23, 2025
177 checks passed
@github-actions github-actions bot added this to the next release milestone Jun 23, 2025
@jade-guiton-dd
Copy link
Contributor

@songy23 The reason I pinged you was to ask whether it would make sense to replace the whole PR with just returncontinue in the loop, since you wrote this code

Dylan-M pushed a commit to Dylan-M/opentelemetry-collector-contrib that referenced this pull request Aug 5, 2025
…waiting on it (open-telemetry#40849)

#### Description

The `run()` goroutine closes the channel in defer. 
There are two sceanrios:
1. Best case scenario: no error encountered while calling
`ConsumeMetrics`.
2. Worst case scenario: we encounter an error while calling
`ConsumeMetrics`, causing the goroutine to exit early and as a result,
close the `c.exit` chan.

We close the channel, again, in `Shutdown` and this results in a "send
on closed channel" panic in worst case scenario. In best case scenario,
no panic is encountered.

This PR adds waitgroup to avoid the panic.

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes open-telemetry#40845

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Added a test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[connector/datadogconnector] panic when receive terminated signal

4 participants