-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Added workers for queue processing #2902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added workers for queue processing #2902
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2902 +/- ##
==========================================
- Coverage 91.91% 91.63% -0.29%
==========================================
Files 494 477 -17
Lines 23939 23324 -615
==========================================
- Hits 22003 21372 -631
- Misses 1429 1451 +22
+ Partials 507 501 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I like the idea of this change and I think your logic is sound, but do you have evidence that this indeed brings performance improvements? I was planning on doing some similar changes as part of the perf tests that I have here: https://github.com/jpkrohling/groupbytrace-tailbasedsampling-perf-comparison I won't have time to run the perf tests for the next couple of weeks, but if you could run the perf comparison above and share your findings, it would be a good start! |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name is now outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc isn't accurate anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps update the doc here to mention consume(trace)
instead? Also mention that the batch is split per trace, and that traces are routed to specific workers based on the trace ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried that we might not have enough tests with NumWorkers higher than 1, so that possible race conditions are not being detected.
I like this PR a lot, and I think you mentioned some numbers elsewhere. I think this would be an improvement, but would you be able to publish your numbers here? The code is now more complex than before, and I would like to see the numbers to assert that the new complexity is worth it. |
Benchmark with 1 worker on 2 core machine (+2 hyperthreading)
Benchmark with 2 workers
Cannot really measure the influence in our production since we haven't still upgraded the collector's version due to some dependencies. You can run the benchmark on something more powerful to see the difference |
The numbers in there are indeed not very exciting. I'll try to run some tests in a bare metal that I have access to, but I will probably only have time to do it next week. If this gets stale, let me know and I'll remove the label (or try making this a draft PR). |
Isn't performance one of the main targets of OpenTelemetry? 20-30% for 2 workers already seems really good, isn't it? Anyway, would be nice to test it in real environment |
22327 ns is 0.023ms. This complexity is saving 0.006ms per operation if my math is right. Not sure it's really worth it, to be honest. |
I cannot provide numbers on the fact, that current implementation just cannot serve the throughput of 50-60k input events per second. And having 16, 32, 64, ... cores on the machine doesn't help because of single worker. It is even more about throughput, not the latency actually. |
Can you give numbers supporting that this change helps increase the throughput for this processor? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Status update: I'm working on the performance comparison between the groupbytrace + policy sampling processor vs. tail sampling processor and I'm picking the code from this PR to test as well. |
aec8e73
to
839f33b
Compare
I've got good numbers from this PR: The first series is the current code, the second is this PR with 10 workers. I'm happy with this PR as is. @pkositsyn, would you like to take another look, just to make sure I didn't do anything wrong during the rebase? |
The graphs above are part of the perf tests recorded here: https://github.com/jpkrohling/groupbytrace-tailbasedsampling-perf-comparison/tree/main/results/2021-05-03-groupbytrace-10-pr2902 |
Signed-off-by: Pavel Kositsyn <[email protected]>
Signed-off-by: Pavel Kositsyn <[email protected]>
839f33b
to
6175cda
Compare
Signed-off-by: Pavel <[email protected]>
@tigrannajaryan, @bogdandrutu, this PR is ready to be merged from my perspective, if the load test failures are intermittent. |
@jpkrohling for load test failures please check why it fails. If we are normally (check some previous "main" branch builds) running close to the limits then increase the limits. We aim for limits to be about 30% above the max observed. |
Will do it whenever I see a failure again! |
Signed-off-by: Pavel Kositsyn [email protected]
Description:
This PR adds workers for processing traces in events machine. Currently, only one thread does processing, which can often be a bottleneck. Some problems were discussed in the issue
@jpkrohling could you take an overview look at this before I add tests?
Link to tracking Issue: #1710
Testing:
To be added
Documentation:
Changed tags for metric