-
Notifications
You must be signed in to change notification settings - Fork 0
Hjiang/fix async flush #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @dentiny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request refines the asynchronous flush mechanism within the Mooncake table storage system. The primary goal is to accurately track and manage ongoing flush operations, particularly when multiple flushes might share the same Logical Sequence Number (LSN) or complete out of order. This is achieved by enhancing the LSN tracking from a simple presence-based set to a count-based map and introducing a mechanism to record completed but unpersisted flush LSNs, ensuring that Iceberg snapshots correctly reflect the state of flushed data and maintain data consistency.
Highlights
- Enhanced LSN Tracking: The
ongoing_flush_lsns
structure inMooncakeTable
has been upgraded from aBTreeSet
to aBTreeMap<u64, u32>
. This change allows the system to track the count of ongoing flush operations for a given LSN, addressing scenarios where multiple flushes might share the same LSN. - Improved Flush Completion Logic: The
remove_ongoing_flush_lsn
function now decrements a counter for an LSN and only removes the LSN from theongoing_flush_lsns
map when its count reaches zero. Completed LSNs are also temporarily stored in a newunrecorded_flush_lsns
set to ensure propernew_flush_lsn
determination for Iceberg snapshots. - Accurate Iceberg Snapshot LSN Determination: The
try_set_next_flush_lsn
logic has been updated to consider theunrecorded_flush_lsns
when no flushes are ongoing, allowing the system to correctly advance thenew_flush_lsn
for Iceberg snapshots based on the largest completed LSN. - Replay Tooling Enhancements: The chaos replay tool has been improved to validate persisted Iceberg table snapshots, ensuring data consistency during replay scenarios. This includes changes to how table metadata and configurations are handled during replay setup and a new validation utility.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the asynchronous flush handling logic, notably by changing ongoing_flush_lsns
to a BTreeMap
to count concurrent flushes for the same LSN and introducing unrecorded_flush_lsns
to track completed flushes. This is a good approach to correctly determine the flush LSN for Iceberg snapshots, especially with out-of-order completions.
My review has identified a critical bug in an assertion that could lead to a panic, which needs to be fixed. I've also pointed out some leftover debugging statements and a hardcoded test seed that should be addressed before merging.
// If there're no ongoing flushes, we're free to set flush LSN to the largest of completed ones. | ||
if !self.has_ongoing_flush() && !self.unrecorded_flush_lsns.is_empty() { | ||
let smallest_flush_lsn = self.unrecorded_flush_lsns.first().unwrap(); | ||
ma::assert_le!(self.next_snapshot_task.new_flush_lsn.unwrap(), *smallest_flush_lsn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is incorrect and will cause a panic in scenarios with out-of-order flush completions.
For example, if flushes for LSNs {10, 20, 30} are processed and complete in the order {20, 10, 30}, new_flush_lsn
will be Some(30)
when try_set_next_flush_lsn(30)
is called. At this point, smallest_flush_lsn
will be 10, causing assert_le!(30, 10)
to fail.
The logic to set new_flush_lsn
to the largest completed LSN seems correct, but this assertion is flawed and should be removed.
.lsn() | ||
.expect("LSN should never be none for non streaming flush"); | ||
self.remove_ongoing_flush_lsn(lsn); | ||
println!("set flush lsn {} at {:?}:{:?}", lsn, file!(), line!()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SystemTime::now() | ||
// .duration_since(UNIX_EPOCH) | ||
// .unwrap() | ||
// .as_nanos() as u64 | ||
1756240686726817373 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random seed for this chaos test is hardcoded. While useful for reproducing a specific failure, this should be reverted to a time-based or otherwise random seed for general testing to ensure a variety of scenarios are covered.
// SystemTime::now() | |
// .duration_since(UNIX_EPOCH) | |
// .unwrap() | |
// .as_nanos() as u64 | |
1756240686726817373 | |
SystemTime::now() | |
.duration_since(UNIX_EPOCH) | |
.unwrap() | |
.as_nanos() as u64 |
Summary
Briefly explain what this PR does.
Related Issues
Closes # or links to related issues.
Changes
Checklist