-
Notifications
You must be signed in to change notification settings - Fork 707
feat(batch): query historical epoch data #6840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d9df1f0 to
b856275
Compare
Codecov Report
@@ Coverage Diff @@
## main #6840 +/- ##
==========================================
- Coverage 72.79% 72.75% -0.04%
==========================================
Files 1035 1035
Lines 166044 166347 +303
==========================================
+ Hits 120864 121034 +170
- Misses 45180 45313 +133
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
b856275 to
813742a
Compare
813742a to
d8d5c00
Compare
|
The historical query works, but performance is ridiculously bad, even for a simple |
It's because |
Caused by #6157? |
This comment was marked as outdated.
This comment was marked as outdated.
hzxa21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM.
| "MAX_SPLIT_RANGE_GAP", | ||
| "SEARCH_PATH", | ||
| "TRANSACTION ISOLATION LEVEL", | ||
| "QUERY_EPOCH", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User doesn't understand epoch so asking user to manually specify the exact query epoch for time-travel query is not realistic. In terms of the syntax, I did a quick check on PG's doc and sadly it seems to deprecate its time travel query SQL command. I suggest we do either of the following options:
- Support querying the backup epochs as well as lookup backup epochs by timestamp. With that, user can find out the exact query epoch it should use for time travel query.
- Support
AS OFsyntax in SELECT:SELECT ... FROM ... AS OF <timestamp>. This is easier to use but may confuse user because this syntax seems to indicate that we can support time-travel query without backup, which is not the case.
For now, I think 1) is sufficient since we don't expect user to run time travel query very often. cc @fuyufjh any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I think it is okay to keep the QUERY_EPOCH session config for admin usage and debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both options are fine.
- With option 1, user can get an exact epoch, then still use it via QUERY_EPOCH, right? Or shall we use another ambiguous session config, e.g. QUERY_BACKUP_EPOCH.
- To implementation option 2, frontend need to be aware of (and sync) backup manifest, thus it can choose a best-fit query epoch for the given arbitrary <timestamp>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both options are fine.
- With option 1, user can get an exact epoch, then still use it via QUERY_EPOCH, right? Or shall we use another ambiguous session config, e.g. QUERY_BACKUP_EPOCH.
- To implementation option 2, frontend need to be aware of (and sync) backup manifest, thus it can choose a best-fit query epoch for the given arbitrary .
Sorry for the late reply. For option 1, I mean we provide a way to get the backed-up epochs via SQL statement (preferred) or risectl. Then we can set QUERY_EPOCH accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed option 1.
Added a new system table: rw_catalog.rw_meta_snapshot. Query it returns all available backups along with human-readable timestamp range of them (safe_epoch_ts and max_committed_epoch_ts). safe_epoch_ts of the first row is NULL, because safe_epoch=0 is not a valid epoch.
dev=> select * from rw_catalog.rw_meta_snapshot;
meta_snapshot_id | hummock_version_id | safe_epoch | safe_epoch_ts | max_committed_epoch | max_committed_epoch_ts
------------------+--------------------+------------------+-------------------------+---------------------+-------------------------
1 | 5 | 0 | | 3551844974919680 | 2022-12-19 06:40:53.255
2 | 9 | 3551845695750144 | 2022-12-19 06:41:04.254 | 3551847139508224 | 2022-12-19 06:41:26.284
(2 rows)
As the size of backup manifest is small enough, I make frontend query it directly from meta node.
Alternatively we could make frontend read backup manifest from backup storage, instead of meta node, but it doesn't bring significant benefit now, so it's not adopted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently backup creation and deletion is still via risectl. We can also support create backup and delete backup <id> SQL later.
| ..Default::default() | ||
| }), | ||
| meta_backup_manifest_id: Some(MetaBackupManifestId { | ||
| id: meta_backup_manifest_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we send all backup manifest ids here instead of just the latest one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it after reading backup_reader.rs. The notification is used to trigger a backup reader fresh, which will get all the manifests.
| let mut committed_version = (**pinned_version).clone(); | ||
| if epoch < pinned_version.safe_epoch() { | ||
| // try to query hummock version from backup | ||
| if let Some(backup_version) = self.backup_reader.try_get_hummock_version(epoch).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking whether we should make reading from backup more explicit instead of just doing it internally in the state store implementation. We can clearly identify misuage or potential bugs by doing so. My thoughts:
- Add another type for backup read epoch in HummockReadEpoch.
- Add another interface or read option for backup read and use this interface only when the HummockReadEpoch is for backup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with the benefit of "clearly identify misusage or potential bugs ". However, that also means the frontend need to be aware of "reading from backup", which is what the current implementation aims to avoid. But if we will support "reading from backup for dropped table", then frontend has to be aware of it anyway.
Will refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
# Conflicts: # src/meta/src/backup_restore/backup_manager.rs
# Conflicts: # src/meta/src/backup_restore/backup_manager.rs
38d8f2f to
3df5c77
Compare
|
I've added a new system schema "rw_catalog" cea5686. |
I think it's necessary to add some details for it in |
# Conflicts: # src/batch/src/executor/join/distributed_lookup_join.rs
hzxa21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the PR!
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR supports "query historical epoch data", which is enabled by meta backup underneath.
Typical use case:
risectl meta backup-metafrom time to time.safe_epochandmax_committed_epochof at least one meta snapshot.SET QUERY_EPOCH=3551845695750200. Then allSELECTin this session returns data as of this epoch.SET QUERY_EPOCH=0to disable historical query.Implementation:
Query schema changed or dropped table is not supported in this PR:
Example with QUERY_EPOCH can be found in
test_query_backup.sh.Checklist
./risedev check(or alias,./risedev c)Documentation
If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.
Types of user-facing changes
Please keep the types that apply to your changes, and remove those that do not apply.
Release note
QUERY_EPOCHto a valid epoch that is covered by any meta snapshot.rw_catalog.rw_catalogcontains information specific to RisingWave, compared with existent system schemaspg_catalogandinformation_schema.rw_meta_snapshotunderrw_catalog. It contains info of all available meta snapshots.Refer to a related PR or issue link (optional)
#6482