Skip to content

Conversation

@zwang28
Copy link
Contributor

@zwang28 zwang28 commented Dec 10, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR supports "query historical epoch data", which is enabled by meta backup underneath.

Typical use case:

  1. Create meta backup via risectl meta backup-meta from time to time.
  2. To query historical data, firstly find a valid epoch. A valid epoch is one that within the safe_epoch and max_committed_epoch of at least one meta snapshot.
dev=> select * from rw_catalog.rw_meta_snapshot;
 meta_snapshot_id | hummock_version_id |    safe_epoch    |      safe_epoch_ts      | max_committed_epoch | max_committed_epoch_ts  
------------------+--------------------+------------------+-------------------------+---------------------+-------------------------
                1 |                  5 |                0 |                         |    3551844974919680 | 2022-12-19 06:40:53.255
                2 |                  9 | 3551845695750144 | 2022-12-19 06:41:04.254 |    3551847139508224 | 2022-12-19 06:41:26.284
(2 rows)
  1. Suppose we choose an epoch 3551845695750200, which is covered by meta backup 2 above.
  2. SET QUERY_EPOCH=3551845695750200. Then all SELECT in this session returns data as of this epoch.
  3. SET QUERY_EPOCH=0 to disable historical query.

Implementation:

  • frontend: support querying available meta backup info, in order to get a best-fit epoch to use.
  • frontend: support specifying query_epoch session variable.
  • storage: support querying data using arbitrary historical hummock version, which is from meta backup.

Query schema changed or dropped table is not supported in this PR:

  • Querying historical data from these tables is doable because meta backup contains hummock version along with table catalog. However schema change feature is not available yet, so shall we also postpone supporting such query?

Example with QUERY_EPOCH can be found in test_query_backup.sh.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • SQL commands, functions, and operators

Release note

  • Add support for query historical data from available meta snapshots. It's enabled by set the session config QUERY_EPOCH to a valid epoch that is covered by any meta snapshot.
  • Add a system schema rw_catalog. rw_catalog contains information specific to RisingWave, compared with existent system schemas pg_catalog and information_schema.
  • Add a system table rw_meta_snapshot under rw_catalog. It contains info of all available meta snapshots.

Refer to a related PR or issue link (optional)

#6482

@github-actions github-actions bot added the type/feature Type: New feature. label Dec 10, 2022
@zwang28 zwang28 force-pushed the wangzheng/query_backup branch from d9df1f0 to b856275 Compare December 11, 2022 05:22
@codecov
Copy link

codecov bot commented Dec 11, 2022

Codecov Report

Merging #6840 (078d90a) into main (b2896cd) will decrease coverage by 0.03%.
The diff coverage is 43.01%.

@@            Coverage Diff             @@
##             main    #6840      +/-   ##
==========================================
- Coverage   72.79%   72.75%   -0.04%     
==========================================
  Files        1035     1035              
  Lines      166044   166347     +303     
==========================================
+ Hits       120864   121034     +170     
- Misses      45180    45313     +133     
Flag Coverage Δ
rust 72.75% <43.01%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...batch/src/executor/join/distributed_lookup_join.rs 0.00% <0.00%> (ø)
src/batch/src/executor/join/local_lookup_join.rs 55.43% <0.00%> (-0.40%) ⬇️
src/batch/src/rpc/service/task_service.rs 0.00% <0.00%> (ø)
src/common/common_service/src/observer_manager.rs 43.56% <0.00%> (-0.44%) ⬇️
src/common/src/catalog/mod.rs 71.42% <ø> (ø)
src/compute/src/server.rs 0.00% <0.00%> (ø)
src/ctl/src/cmd_impl/hummock/list_kv.rs 0.00% <0.00%> (ø)
src/ctl/src/common/hummock_service.rs 0.00% <0.00%> (ø)
...ntend/src/catalog/system_catalog/pg_catalog/mod.rs 0.00% <0.00%> (ø)
src/frontend/src/handler/query.rs 18.23% <0.00%> (-0.42%) ⬇️
... and 54 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zwang28 zwang28 marked this pull request as ready for review December 11, 2022 05:46
@zwang28 zwang28 force-pushed the wangzheng/query_backup branch from b856275 to 813742a Compare December 12, 2022 07:58
@zwang28 zwang28 force-pushed the wangzheng/query_backup branch from 813742a to d8d5c00 Compare December 12, 2022 13:03
@zwang28
Copy link
Contributor Author

zwang28 commented Dec 12, 2022

The historical query works, but performance is ridiculously bad, even for a simple select * that triggers #vnode concurrent tasks. Investigating.

@zwang28 zwang28 marked this pull request as draft December 12, 2022 13:09
@zwang28
Copy link
Contributor Author

zwang28 commented Dec 13, 2022

The historical query works, but performance is ridiculously bad, even for a simple select * that triggers #vnode concurrent tasks. Investigating.

It's because validate_epoch can take as long as 0.3 seconds in debug mode when returning error (HummockError::expired_epoch(safe_epoch, epoch)).
And historical query logic was implemented as executed if validate_epoch fails. Fixing.
Fixed

@BugenZhao
Copy link
Member

It's because validate_epoch can take as long as 0.3 seconds in debug mode when returning error (HummockError::expired_epoch(safe_epoch, epoch)).

Caused by #6157?

@zwang28 zwang28 marked this pull request as ready for review December 13, 2022 04:53
@zwang28

This comment was marked as outdated.

@zwang28 zwang28 added A-storage Area: Storage. A-frontend Area: Protocol, parsing, binder. labels Dec 13, 2022
@zwang28 zwang28 requested a review from chenzl25 December 13, 2022 05:26
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

"MAX_SPLIT_RANGE_GAP",
"SEARCH_PATH",
"TRANSACTION ISOLATION LEVEL",
"QUERY_EPOCH",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User doesn't understand epoch so asking user to manually specify the exact query epoch for time-travel query is not realistic. In terms of the syntax, I did a quick check on PG's doc and sadly it seems to deprecate its time travel query SQL command. I suggest we do either of the following options:

  1. Support querying the backup epochs as well as lookup backup epochs by timestamp. With that, user can find out the exact query epoch it should use for time travel query.
  2. Support AS OF syntax in SELECT: SELECT ... FROM ... AS OF <timestamp>. This is easier to use but may confuse user because this syntax seems to indicate that we can support time-travel query without backup, which is not the case.

For now, I think 1) is sufficient since we don't expect user to run time travel query very often. cc @fuyufjh any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think it is okay to keep the QUERY_EPOCH session config for admin usage and debugging.

Copy link
Contributor Author

@zwang28 zwang28 Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both options are fine.

  • With option 1, user can get an exact epoch, then still use it via QUERY_EPOCH, right? Or shall we use another ambiguous session config, e.g. QUERY_BACKUP_EPOCH.
  • To implementation option 2, frontend need to be aware of (and sync) backup manifest, thus it can choose a best-fit query epoch for the given arbitrary <timestamp>.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both options are fine.

  • With option 1, user can get an exact epoch, then still use it via QUERY_EPOCH, right? Or shall we use another ambiguous session config, e.g. QUERY_BACKUP_EPOCH.
  • To implementation option 2, frontend need to be aware of (and sync) backup manifest, thus it can choose a best-fit query epoch for the given arbitrary .

Sorry for the late reply. For option 1, I mean we provide a way to get the backed-up epochs via SQL statement (preferred) or risectl. Then we can set QUERY_EPOCH accordingly.

Copy link
Contributor Author

@zwang28 zwang28 Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed option 1.
Added a new system table: rw_catalog.rw_meta_snapshot. Query it returns all available backups along with human-readable timestamp range of them (safe_epoch_ts and max_committed_epoch_ts). safe_epoch_ts of the first row is NULL, because safe_epoch=0 is not a valid epoch.

dev=> select * from rw_catalog.rw_meta_snapshot;
 meta_snapshot_id | hummock_version_id |    safe_epoch    |      safe_epoch_ts      | max_committed_epoch | max_committed_epoch_ts  
------------------+--------------------+------------------+-------------------------+---------------------+-------------------------
                1 |                  5 |                0 |                         |    3551844974919680 | 2022-12-19 06:40:53.255
                2 |                  9 | 3551845695750144 | 2022-12-19 06:41:04.254 |    3551847139508224 | 2022-12-19 06:41:26.284
(2 rows)

As the size of backup manifest is small enough, I make frontend query it directly from meta node.
Alternatively we could make frontend read backup manifest from backup storage, instead of meta node, but it doesn't bring significant benefit now, so it's not adopted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently backup creation and deletion is still via risectl. We can also support create backup and delete backup <id> SQL later.

..Default::default()
}),
meta_backup_manifest_id: Some(MetaBackupManifestId {
id: meta_backup_manifest_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we send all backup manifest ids here instead of just the latest one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it after reading backup_reader.rs. The notification is used to trigger a backup reader fresh, which will get all the manifests.

let mut committed_version = (**pinned_version).clone();
if epoch < pinned_version.safe_epoch() {
// try to query hummock version from backup
if let Some(backup_version) = self.backup_reader.try_get_hummock_version(epoch).await {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking whether we should make reading from backup more explicit instead of just doing it internally in the state store implementation. We can clearly identify misuage or potential bugs by doing so. My thoughts:

  1. Add another type for backup read epoch in HummockReadEpoch.
  2. Add another interface or read option for backup read and use this interface only when the HummockReadEpoch is for backup.

@Li0k @wenym1 what do you think?

Copy link
Contributor Author

@zwang28 zwang28 Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the benefit of "clearly identify misusage or potential bugs ". However, that also means the frontend need to be aware of "reading from backup", which is what the current implementation aims to avoid. But if we will support "reading from backup for dropped table", then frontend has to be aware of it anyway.

Will refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

# Conflicts:
#	src/meta/src/backup_restore/backup_manager.rs
# Conflicts:
#	src/meta/src/backup_restore/backup_manager.rs
@zwang28 zwang28 force-pushed the wangzheng/query_backup branch from 38d8f2f to 3df5c77 Compare December 16, 2022 11:24
@zwang28
Copy link
Contributor Author

zwang28 commented Dec 19, 2022

I've added a new system schema "rw_catalog" cea5686.
Would you PTAL @yezizp2012 @HuaHuaY

@zwang28 zwang28 requested a review from hzxa21 December 19, 2022 06:52
@yezizp2012
Copy link
Member

yezizp2012 commented Dec 19, 2022

I've added a new system schema "rw_catalog" cea5686. Would you PTAL @yezizp2012 @HuaHuaY

I think it's necessary to add some details for it in Documentation section and tag this PR with user-facing-changes.

@zwang28 zwang28 added the user-facing-changes Contains changes that are visible to users label Dec 19, 2022
# Conflicts:
#	src/batch/src/executor/join/distributed_lookup_join.rs
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR!

@zwang28 zwang28 added this to the release-0.1.15 milestone Dec 19, 2022
@mergify mergify bot merged commit 829df13 into main Dec 19, 2022
@mergify mergify bot deleted the wangzheng/query_backup branch December 19, 2022 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-frontend Area: Protocol, parsing, binder. A-storage Area: Storage. type/feature Type: New feature. user-facing-changes Contains changes that are visible to users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants