Skip to content

Conversation

h0cheung
Copy link
Contributor

@h0cheung h0cheung commented Apr 23, 2024

Description:

When starting k8sattributes processor, block until an initial synchronization has been completed. This solves #32556

Link to tracking Issue:

fix #32556

Testing:

Tested in a cluster with constant high span traffic, no more spans with unassociated k8s metadata after adding this pr.

Documentation:

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 10, 2024
@github-actions github-actions bot removed the Stale label May 17, 2024
@orloffv
Copy link

orloffv commented May 30, 2024

@h0cheung Hi there, I was wondering if you have the time to finalize and merge the pull request. Looking forward to your update. Thanks!

Copy link
Contributor

@fatsheep9146 fatsheep9146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience!

I have an advice about the two new options start_timeout and error_when_timeout.

The reason for this enhancement is to make sure the k8sattributes processor will start consuming the incoming traces or metrics until the initial synchronization are completed,
otherwise, some traces/metrics/logs may be unassociated with k8s metadata.

But I think some users may care more about the start rate of collector than the completeness of the k8s metadata.

So how about we change this two fields to

  • force_wait_for_cache_synced bool default is false
  • wait_for_synced_timeout time.Duration

The users who care more about the completeness could set force_wait_for_cache_synced bool to true and if the synced is still not completed after timeout wait_for_synced_timeout, the collector should directly return error instead of running, since the completeness could not be guaranteed.

@h0cheung @TylerHelmuth

@h0cheung
Copy link
Contributor Author

h0cheung commented Jun 11, 2024

Thanks for your patience!

I have an advice about the two new options start_timeout and error_when_timeout.

The reason for this enhancement is to make sure the k8sattributes processor will start consuming the incoming traces or metrics until the initial synchronization are completed, otherwise, some traces/metrics/logs may be unassociated with k8s metadata.

But I think some users may care more about the start rate of collector than the completeness of the k8s metadata.

So how about we change this two fields to

  • force_wait_for_cache_synced bool default is false
  • wait_for_synced_timeout time.Duration

The users who care more about the completeness could set force_wait_for_cache_synced bool to true and if the synced is still not completed after timeout wait_for_synced_timeout, the collector should directly return error instead of running, since the completeness could not be guaranteed.

@h0cheung @TylerHelmuth

I agree. What are your thoughts? @TylerHelmuth

@orloffv
Copy link

orloffv commented Jun 24, 2024

@TylerHelmuth can you see, plz?

@TylerHelmuth
Copy link
Member

@fatsheep9146 I like that idea. I suggest adjusting naming it wait_for_metadata and wait_for_metadata_timeout to hide some of the details from the end user. I believe we use the term metadata already to refer to the pod data that is gathered and held.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 17, 2024
@h0cheung
Copy link
Contributor Author

This PR was marked stale due to lack of activity. It will be closed in 14 days.

remove stale

@h0cheung
Copy link
Contributor Author

@fatsheep9146 I like that idea. I suggest adjusting naming it wait_for_metadata and wait_for_metadata_timeout to hide some of the details from the end user. I believe we use the term metadata already to refer to the pod data that is gathered and held.

Thanks. These two configuration items look simple and clear. I think we can use them.

@mx-psi mx-psi removed the Stale label Jul 25, 2024
@valner
Copy link

valner commented Aug 7, 2024

@h0cheung Hi, could you take some time to work out the code and merge the pull request? I’d really appreciate your update on this. Thanks a lot

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor wording improvement :).

@h0cheung h0cheung force-pushed the feat-k8sblock branch 4 times, most recently from 767326c to 73065a6 Compare November 7, 2024 03:08
@ChrsMark
Copy link
Member

ChrsMark commented Nov 7, 2024

@fatsheep9146 @dmitryax could you take a look as well? Otherwise this should be good to go.

@TylerHelmuth
Copy link
Member

@fatsheep9146 @dmitryax I plan to merge by end of week, please comment if you have objections

Copy link
Contributor

@fatsheep9146 fatsheep9146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@h0cheung h0cheung force-pushed the feat-k8sblock branch 3 times, most recently from 26a7ef7 to