Skip to content

Conversation

dentiny
Copy link
Owner

@dentiny dentiny commented Aug 27, 2025

Summary

Briefly explain what this PR does.

Related Issues

Closes # or links to related issues.

Changes

Checklist

  • Code builds correctly
  • Tests have been added or updated
  • Documentation updated if necessary
  • I have reviewed my own changes

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @dentiny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the asynchronous flush mechanism within the Mooncake table storage system. The primary goal is to accurately track and manage ongoing flush operations, particularly when multiple flushes might share the same Logical Sequence Number (LSN) or complete out of order. This is achieved by enhancing the LSN tracking from a simple presence-based set to a count-based map and introducing a mechanism to record completed but unpersisted flush LSNs, ensuring that Iceberg snapshots correctly reflect the state of flushed data and maintain data consistency.

Highlights

  • Enhanced LSN Tracking: The ongoing_flush_lsns structure in MooncakeTable has been upgraded from a BTreeSet to a BTreeMap<u64, u32>. This change allows the system to track the count of ongoing flush operations for a given LSN, addressing scenarios where multiple flushes might share the same LSN.
  • Improved Flush Completion Logic: The remove_ongoing_flush_lsn function now decrements a counter for an LSN and only removes the LSN from the ongoing_flush_lsns map when its count reaches zero. Completed LSNs are also temporarily stored in a new unrecorded_flush_lsns set to ensure proper new_flush_lsn determination for Iceberg snapshots.
  • Accurate Iceberg Snapshot LSN Determination: The try_set_next_flush_lsn logic has been updated to consider the unrecorded_flush_lsns when no flushes are ongoing, allowing the system to correctly advance the new_flush_lsn for Iceberg snapshots based on the largest completed LSN.
  • Replay Tooling Enhancements: The chaos replay tool has been improved to validate persisted Iceberg table snapshots, ensuring data consistency during replay scenarios. This includes changes to how table metadata and configurations are handled during replay setup and a new validation utility.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the asynchronous flush handling logic, notably by changing ongoing_flush_lsns to a BTreeMap to count concurrent flushes for the same LSN and introducing unrecorded_flush_lsns to track completed flushes. This is a good approach to correctly determine the flush LSN for Iceberg snapshots, especially with out-of-order completions.

My review has identified a critical bug in an assertion that could lead to a panic, which needs to be fixed. I've also pointed out some leftover debugging statements and a hardcoded test seed that should be addressed before merging.

// If there're no ongoing flushes, we're free to set flush LSN to the largest of completed ones.
if !self.has_ongoing_flush() && !self.unrecorded_flush_lsns.is_empty() {
let smallest_flush_lsn = self.unrecorded_flush_lsns.first().unwrap();
ma::assert_le!(self.next_snapshot_task.new_flush_lsn.unwrap(), *smallest_flush_lsn);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This assertion is incorrect and will cause a panic in scenarios with out-of-order flush completions.

For example, if flushes for LSNs {10, 20, 30} are processed and complete in the order {20, 10, 30}, new_flush_lsn will be Some(30) when try_set_next_flush_lsn(30) is called. At this point, smallest_flush_lsn will be 10, causing assert_le!(30, 10) to fail.

The logic to set new_flush_lsn to the largest completed LSN seems correct, but this assertion is flawed and should be removed.

.lsn()
.expect("LSN should never be none for non streaming flush");
self.remove_ongoing_flush_lsn(lsn);
println!("set flush lsn {} at {:?}:{:?}", lsn, file!(), line!());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This println! statement and others added in this PR (e.g., line 916, 1233, and in files snapshot.rs, transaction_stream.rs, chaos_replay.rs) appear to be for debugging. They should be removed before merging to keep the codebase clean.

Comment on lines +66 to +70
// SystemTime::now()
// .duration_since(UNIX_EPOCH)
// .unwrap()
// .as_nanos() as u64
1756240686726817373

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The random seed for this chaos test is hardcoded. While useful for reproducing a specific failure, this should be reverted to a time-based or otherwise random seed for general testing to ensure a variety of scenarios are covered.

Suggested change
// SystemTime::now()
// .duration_since(UNIX_EPOCH)
// .unwrap()
// .as_nanos() as u64
1756240686726817373
SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_nanos() as u64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant