Skip to content

Conversation

@ljedrz
Copy link
Collaborator

@ljedrz ljedrz commented Nov 18, 2025

This PR targets #3026.

I used pprof to analyze the loading of a full mainnet ledger, and the resulting flamegraph:

flamegraph

indicated that nearly 96% of the associated work is the creation of the block tree. I double-checked it, and indeed, around 90% of the loading time was spent on that operation.

These changes facilitate the caching of the block tree alongside the ledger, and automatically loading it on startup. A snarkOS counterpart PR will follow, where one of the methods introduced here will be automatically called on shutdown.

While I don't have the full mainnet ledger locally, I have verified that the one I do have loads more quickly when these changes are introduced.

@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 20, 2025

I ran the numbers on a full maninnet ledger after these changes, and it's a clear improvement:

before: 239.75s
after:   27.82s (down 88%)

Dumping the serialized block tree took 2.48s, and the resulting file was 965MiB.

For future reference, I'm also attaching the updated flamegraph of a ledger load (w/o rayon for increased clarity):

flamegraph_new

I'll analyze it to see if there's room for more improvement, but these would be follow-ups unrelated to the changes from this PR.


if let Ok(serialized_tree) = fs::read(&path) {
// Deserialize a ready block tree.
let ret = bincode::deserialize(&serialized_tree).or_else(|e| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be additional checks here to ensure the block tree is up-to-date? I think you can just compare the most recent block hashes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note L259: the file is removed as soon as it's loaded, in order to ensure that it won't be reused, and the new one is only created on shutdown (on snarkOS side); it's a valid concern if we ever want to cache the block tree outside of shutdown conditions, but I don't think we do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry was about clients moving ledger snapshots around and restoring an old file. Another, very unlikely, case is that the file gets corrupted but deserializes fine.
Adding another sanity check, if not too expensive, could save us some headaches in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's a fair concern; I'm not too savvy with the tree myself, do you have a suggestion how to easily sanity-check it? or @vicsn?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merkle tree's leaves are the block hashes.

Inside of struct MerkleTree, we still store hashes of those leaves in:

    /// The internal hashes, from root to hashed leaves, of the full Merkle tree.
    tree: Vec<PH::Hash>,

Looking at the API which MerkleTree exposes, I see 3 options. We should pick one which balances simplicity and speed:

  • use pub const fn number_of_leaves to see if it matches the number of blocks (not sure if this requires iterating over the entire db index).
  • use pub fn prove the final leave see if it returns without error (not very idiomatic)
  • use pub fn leaf_hashes to match the last block hash (though note you have to manually hash the leave to see if it matches)

@kaimast
Copy link
Collaborator

kaimast commented Nov 20, 2025

Looks great to me! I left one comment about potentially loading an outdated cache, which I believe needs to addressed.

Additionally, I wonder if we could store the block tree on Ledger::drop and not have snarkOS counterpart. That would be cleaner and ensure it is only invoked during shutdown. snarkOS shutdown is a little messy as of now, though, so that might be not possible yet (hence ProvableHQ/snarkOS#3915).

@vicsn
Copy link
Collaborator

vicsn commented Nov 21, 2025

PR looks great, will approve after Kai's comments are addressed

@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 21, 2025

I wonder if we could store the block tree on Ledger::drop and not have snarkOS counterpart.

Yeah, it did work just fine, and I guess there's no reason not to have it happen implicitly; I'll remove the snarkOS-side PR if this change is accepted.

In other words, review comments addressed.

@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 21, 2025

After some additional tests I noticed that the Ledger is not internally Arced, so the Drop impl needs to be adjusted; I should have it ready soon.

@ljedrz ljedrz force-pushed the perf/cache_block_tree branch from a9bbcf9 to 9dfebec Compare November 21, 2025 12:17
@ljedrz ljedrz marked this pull request as ready for review November 21, 2025 12:18
@ljedrz ljedrz force-pushed the perf/cache_block_tree branch from 9dfebec to 66f8742 Compare November 21, 2025 12:30
@ljedrz ljedrz marked this pull request as draft November 21, 2025 12:43
@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 21, 2025

This is the right direction, but I'm not always seeing the Drop being triggered; I need to run some more snarkOS tests.

@ljedrz ljedrz marked this pull request as ready for review November 21, 2025 13:15
@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 21, 2025

@kaimast ok, I have a temporary shutdown solution for snarkOS to ensure that the Drop gets triggered. I'll propose it, and it can later be superseded by your changes.

Edit: filed as ProvableHQ/snarkOS#4021.

Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's await Kai

@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 25, 2025

I've introduced a very cheap (basically free) coherence check in Ledger::load_unchecked; as a bonus, we now call BlockStore::max_height - which traverses the entire reverse state root map (meaning it is somewhat expensive, even though it is optimized) - only once.

let latest_height =
ledger.vm.block_store().max_height().with_context(|| "Failed to load blocks from the ledger")?;
// Ensure that the greatest stored height matches that of the block tree.
ensure!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant find - can you confirm that you manually tested that we get the expected error message? A simple way could be to run a network, remove a node's ledger but nót their block cache, and then to start the node again

Copy link
Collaborator Author

@ljedrz ljedrz Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test locally that it works, though I haven't swapped the cached tree to check the error; that being said, I initially used the value of 1 instead of 0 for the genesis scenario, and the node did fail to load (but I don't remember if the message was there). I can double-check with a swapped ledger as soon as I can.

@ljedrz
Copy link
Collaborator Author

ljedrz commented Nov 26, 2025

I've tested both possible cases:

  1. when the block tree is above the storage, BlockStore::open now fails with Block {height} exists in tree but not in storage; perhaps you used a wrong block tree cache file?
  2. when the block tree is below the storage, Ledger::load_unchecked now fails with The stored height is different than the one in the block tree; please ensure that the cached block tree is valid or delete it

@raychu86 raychu86 assigned raychu86 and unassigned raychu86 Nov 26, 2025
// Construct the block tree.
debug!("Found {} blocks on disk", hashes.len());
let tree = N::merkle_tree_bhp(&hashes)?;
let tree = storage.create_block_tree()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add subsequent checks here to ensure that the root of the block tree matches the proper state root in the state_root_map for the expected block height that the tree is supposed to represent?

If this fails, we would need to reconstruct the tree from scratch.


/// Serializes and persists the current block tree.
#[cfg(feature = "rocks")]
pub fn cache_block_tree(&self) -> Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a lock we can add to enforce that this operation is properly performed and completed at shutdown?

Copy link
Collaborator Author

@ljedrz ljedrz Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is executed as part of the Drop impl for the Ledger, which is a synchronous operation and always fully concludes on a clean shutdown, and in snarkOS, even in case of a panic (as it uses the default panic = "unwind" mode); using a lock isn't necessary, as the aforementioned impl has full exclusive ownership of the Ledger (via &mut self).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented somewhere in the code? It'd be nice to indicate the atomic properties of the Drop impl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the block store also dropped when dropping the ledger? Maybe we could not expose this function at all, and just have it happen on drop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raychu86 I extended the docs for the Drop impl.
@kaimast I think there is utility in having it as a method on BlockStore too, in case we want to use it for some test run or if we ever need to extend the Drop impl for the Ledger (which is the right impl to use, as all the stores only ever exist as part of the Ledger, so it's a clear central point for cleanups).

@ljedrz ljedrz requested a review from raychu86 November 27, 2025 14:06
@kaimast kaimast self-requested a review December 5, 2025 21:36
Copy link
Collaborator

@kaimast kaimast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think @raychu86 comments are all addressed as well.

@vicsn
Copy link
Collaborator

vicsn commented Dec 18, 2025

@ljedrz can you merge in origin/staging?

@ljedrz
Copy link
Collaborator Author

ljedrz commented Dec 19, 2025

Merged staging and addressed the final suggestions.

@vicsn vicsn merged commit 4e3855d into ProvableHQ:staging Dec 19, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants