Skip to content

Conversation

ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented Aug 7, 2025

Description

We've received a bunch of indirect and direct feedback that it's hard to understand what is going on when the exporter fails to send Remote Write requests.

I'm addressing that by improving our logging around our outgoing requests.

Fixes #39703

case <-ticker.C:
// In normal state, wIndex and rIndex will differ by one. To avoid having -1 as a final value, we set it to 0 as minimum.
lag := max(0, int64(prweWAL.wWALIndex.Load()-prweWAL.rWALIndex.Load()))
logger.Info("recording lag log", zap.Int64("wIndex", int64(prweWAL.wWALIndex.Load())), zap.Int64("rIndex", int64(prweWAL.wWALIndex.Load())), zap.Int64("lag", lag))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line was being spammed. I don't see any value in a log line that tells us we're updating a certain metric.

}

if err != nil {
return consumererror.NewPermanent(err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm missing something, but all these NewPermanent errors are only adding "New Permanent" to the log message, and nothing more. I removed all of those calls because having "New Permanent" in our logs doesn't seem relevant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I quickly learned that this is needed for retries :). It's unfortunate that the side effect is a very polluted log experience

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so with that in mind are you sure about removing the consumererror.NewPermanent in the other places like
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/41856/files#diff-2631d8a8855fdc08cf077c0ab99544147ca73eec572bc871d8fe717e826eda38L359 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good catch. I was trying to get rid of permanent errors from WAL and ended up doing a bit more by accident when doing my replace. Good catch!

@ArthurSens ArthurSens force-pushed the prwexporter-improve-logging branch from 632903e to 08fdc08 Compare August 7, 2025 22:22
@ArthurSens
Copy link
Member Author

Not sure why the changelog CI is failing 🤔

Signed-off-by: Arthur Silva Sens <[email protected]>
Signed-off-by: Arthur Silva Sens <[email protected]>
@ArthurSens ArthurSens force-pushed the prwexporter-improve-logging branch from 2f1e9d3 to 11d96fc Compare August 8, 2025 10:58
@ArthurSens
Copy link
Member Author

Alright, I think the PR is ready for review but I honestly don't understand why changelog CI is failing 😭

// Reference for different behavior according to status code:
// https://github.com/prometheus/prometheus/pull/2552/files#diff-ae8db9d16d8057358e49d694522e7186
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
prwe.settings.Logger.Info("remote write request successful",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should probably be debug-level logging? This is essentially request logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair point, I was not sure which level we should put this. I was just trying to understand if requests were being made at all.

I'll update this to debug

@ArthurSens
Copy link
Member Author

Alright, I think the PR is ready for review but I honestly don't understand why changelog CI is failing 😭

Rebase on latest mais solved the problem

Copy link
Contributor

@jmichalek132 jmichalek132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one small nit, otherwise LGTM.

Signed-off-by: Arthur Silva Sens <[email protected]>
@ArthurSens
Copy link
Member Author

Based on #41992 (comment), I'm marking this PR as 'ready to merge'

@ArthurSens ArthurSens added the ready to merge Code review completed; ready to merge by maintainers label Aug 14, 2025
@songy23 songy23 merged commit d9b8851 into open-telemetry:main Aug 14, 2025
192 of 199 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exporter/prometheusremotewrite ready to merge Code review completed; ready to merge by maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exporterhelper truncating logs when prometheus remote writer fails writing metrics

4 participants