-
Notifications
You must be signed in to change notification settings - Fork 56
Parallelize node preloads #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #454 +/- ##
==========================================
- Coverage 88.61% 88.04% -0.58%
==========================================
Files 39 38 -1
Lines 9109 8288 -821
==========================================
- Hits 8072 7297 -775
+ Misses 1037 991 -46 ☔ View full report in Codecov by Sentry. |
9badf3d to
56c9ba8
Compare
slawlor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nifty! But I think we could do with making the bool arg an enum for safer usage (As I'm guessing @dillonrg would say 😛 )
kevinlewi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Overview
This PR adds parallelization for node preloading. This primarily benefits publishing, which performs node preloading prior to node insertion and audit proof generation.
During node preloading, we perform a Breadth First Search on the tree starting from the root, fetching each level of the tree in sequence. This operation's latency is lower bounded since a certain minimum number of back-to-back round trips to DB is always required. However, the amount of work performed between each DB read is not trivial, and can contribute significantly to the overall preload latency when publishing hundreds of thousands of keys.
We can improve preload latency and bring it closer to the theoretical lower bound, by parallelizing the work performed between DB fetches. In this PR, I do so by having having each step in the BFS split the subsequent work for the next level into two chunks, having each chunk be processed concurrently. The storage layer might also benefit from performing DB operations in batches that are not overly large.
Note that node preloading is used in publishing and lookups, and parallelism is always disabled for the lookup path. This is because lookups are likely performed on a machine that services client requests. We do not want a single lookup to consume all the resources on the machine and starve other client requests.
Benchmark
Ran the azks benchmark, on trunk (fcd665a) and on this PR:
Batch Insertion (1000 Initial Leaves, 1000 Inserted Leaves)
Trunk:

This PR:

Batch Insertion (200,000 Initial Leaves, 200,000 Inserted Leaves)
Trunk:

This PR:

On my 10-core Macbook Pro, we get a decent amount of improvement (~27%). This improvement should be larger on higher core count machines, and at higher load levels.