-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Perf] Facilitate the caching of the block tree #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
|
I ran the numbers on a full maninnet ledger after these changes, and it's a clear improvement: Dumping the serialized block tree took 2.48s, and the resulting file was 965MiB. For future reference, I'm also attaching the updated flamegraph of a ledger load (w/o I'll analyze it to see if there's room for more improvement, but these would be follow-ups unrelated to the changes from this PR. |
|
|
||
| if let Ok(serialized_tree) = fs::read(&path) { | ||
| // Deserialize a ready block tree. | ||
| let ret = bincode::deserialize(&serialized_tree).or_else(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be additional checks here to ensure the block tree is up-to-date? I think you can just compare the most recent block hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note L259: the file is removed as soon as it's loaded, in order to ensure that it won't be reused, and the new one is only created on shutdown (on snarkOS side); it's a valid concern if we ever want to cache the block tree outside of shutdown conditions, but I don't think we do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry was about clients moving ledger snapshots around and restoring an old file. Another, very unlikely, case is that the file gets corrupted but deserializes fine.
Adding another sanity check, if not too expensive, could save us some headaches in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's a fair concern; I'm not too savvy with the tree myself, do you have a suggestion how to easily sanity-check it? or @vicsn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merkle tree's leaves are the block hashes.
Inside of struct MerkleTree, we still store hashes of those leaves in:
/// The internal hashes, from root to hashed leaves, of the full Merkle tree.
tree: Vec<PH::Hash>,
Looking at the API which MerkleTree exposes, I see 3 options. We should pick one which balances simplicity and speed:
- use
pub const fn number_of_leavesto see if it matches the number of blocks (not sure if this requires iterating over the entire db index). - use
pub fn provethe final leave see if it returns without error (not very idiomatic) - use
pub fn leaf_hashesto match the last block hash (though note you have to manually hash the leave to see if it matches)
|
Looks great to me! I left one comment about potentially loading an outdated cache, which I believe needs to addressed. Additionally, I wonder if we could store the block tree on |
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
|
PR looks great, will approve after Kai's comments are addressed |
Yeah, it did work just fine, and I guess there's no reason not to have it happen implicitly; I'll remove the snarkOS-side PR if this change is accepted. In other words, review comments addressed. |
|
After some additional tests I noticed that the |
a9bbcf9 to
9dfebec
Compare
Signed-off-by: ljedrz <[email protected]>
9dfebec to
66f8742
Compare
|
This is the right direction, but I'm not always seeing the |
|
@kaimast ok, I have a temporary shutdown solution for snarkOS to ensure that the Edit: filed as ProvableHQ/snarkOS#4021. |
vicsn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's await Kai
…hecked Signed-off-by: ljedrz <[email protected]>
|
I've introduced a very cheap (basically free) coherence check in |
| let latest_height = | ||
| ledger.vm.block_store().max_height().with_context(|| "Failed to load blocks from the ledger")?; | ||
| // Ensure that the greatest stored height matches that of the block tree. | ||
| ensure!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant find - can you confirm that you manually tested that we get the expected error message? A simple way could be to run a network, remove a node's ledger but nót their block cache, and then to start the node again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did test locally that it works, though I haven't swapped the cached tree to check the error; that being said, I initially used the value of 1 instead of 0 for the genesis scenario, and the node did fail to load (but I don't remember if the message was there). I can double-check with a swapped ledger as soon as I can.
…h the storage Signed-off-by: ljedrz <[email protected]>
|
I've tested both possible cases:
|
| // Construct the block tree. | ||
| debug!("Found {} blocks on disk", hashes.len()); | ||
| let tree = N::merkle_tree_bhp(&hashes)?; | ||
| let tree = storage.create_block_tree()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add subsequent checks here to ensure that the root of the block tree matches the proper state root in the state_root_map for the expected block height that the tree is supposed to represent?
If this fails, we would need to reconstruct the tree from scratch.
|
|
||
| /// Serializes and persists the current block tree. | ||
| #[cfg(feature = "rocks")] | ||
| pub fn cache_block_tree(&self) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a lock we can add to enforce that this operation is properly performed and completed at shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is executed as part of the Drop impl for the Ledger, which is a synchronous operation and always fully concludes on a clean shutdown, and in snarkOS, even in case of a panic (as it uses the default panic = "unwind" mode); using a lock isn't necessary, as the aforementioned impl has full exclusive ownership of the Ledger (via &mut self).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this documented somewhere in the code? It'd be nice to indicate the atomic properties of the Drop impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the block store also dropped when dropping the ledger? Maybe we could not expose this function at all, and just have it happen on drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raychu86 I extended the docs for the Drop impl.
@kaimast I think there is utility in having it as a method on BlockStore too, in case we want to use it for some test run or if we ever need to extend the Drop impl for the Ledger (which is the right impl to use, as all the stores only ever exist as part of the Ledger, so it's a clear central point for cleanups).
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
kaimast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think @raychu86 comments are all addressed as well.
|
@ljedrz can you merge in origin/staging? |
Signed-off-by: ljedrz <[email protected]>
|
Merged |
This PR targets #3026.
I used
pprofto analyze the loading of a full mainnet ledger, and the resulting flamegraph:indicated that nearly 96% of the associated work is the creation of the block tree. I double-checked it, and indeed, around 90% of the loading time was spent on that operation.
These changes facilitate the caching of the block tree alongside the ledger, and automatically loading it on startup. A snarkOS counterpart PR will follow, where one of the methods introduced here will be automatically called on shutdown.
While I don't have the full mainnet ledger locally, I have verified that the one I do have loads more quickly when these changes are introduced.