Skip to content

Conversation

@hzxa21
Copy link
Collaborator

@hzxa21 hzxa21 commented Jan 29, 2022

  • Add lifetime to StateStore::Iter and all types of hummock iterators
  • Refactor MViewTableIter to do pagination-style iteration on state store iterator (Avoid keeping a single StateStore iterator instance in MViewTableIter #10)
  • Refactor StateStore read interfaces to take a range and leave the prefix-related logic to keyspace.
  • Make all read interfaces epoch-aware. HummockSnapshot is no longer needed because snapshot read is enforced by the read interfaces via epoch.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

part of #39

@hzxa21 hzxa21 marked this pull request as draft January 29, 2022 17:01
@codecov
Copy link

codecov bot commented Jan 29, 2022

Codecov Report

Merging #62 (09acbfd) into main (4fa67cf) will decrease coverage by 0.01%.
The diff coverage is 73.06%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main      #62      +/-   ##
============================================
- Coverage     74.58%   74.57%   -0.02%     
  Complexity     2654     2654              
============================================
  Files           838      837       -1     
  Lines         47795    47875      +80     
  Branches       1562     1562              
============================================
+ Hits          35649    35703      +54     
- Misses        11347    11373      +26     
  Partials        799      799              
Flag Coverage Δ
java 62.08% <ø> (ø)
rust 79.96% <73.06%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/bench/ss_bench/operations/get_random.rs 0.00% <0.00%> (ø)
rust/bench/ss_bench/operations/get_seq.rs 0.00% <0.00%> (ø)
...st/bench/ss_bench/operations/prefix_scan_random.rs 0.00% <0.00%> (ø)
rust/storage/src/hummock/compactor.rs 2.43% <ø> (ø)
rust/storage/src/hummock/iterator/merge.rs 100.00% <ø> (ø)
rust/storage/src/hummock/iterator/reverse_merge.rs 100.00% <ø> (ø)
rust/storage/src/hummock/iterator/reverse_user.rs 99.19% <ø> (-0.61%) ⬇️
rust/storage/src/hummock/iterator/user.rs 94.08% <ø> (-0.26%) ⬇️
rust/storage/src/hummock/state_store.rs 0.00% <0.00%> (ø)
rust/storage/src/panic_store.rs 0.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fa67cf...09acbfd. Read the comment docs.

@hzxa21 hzxa21 marked this pull request as ready for review January 30, 2022 15:12
@hzxa21 hzxa21 force-pushed the patrick/async-flush-part1 branch from c1742e7 to 782fa8d Compare January 30, 2022 15:14
Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, some small changes commented below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ignore the first key? If we include last key in the scanning key range, the last key will also appear in the buf.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_with_start_key uses exclusive start_key so the last key will not appear in the buf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name doesn't seem straightforward 🤪

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to rename it to scan_with_exclusive_start_key

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed scan_with_start_key to take an inclusive start_key. The caller will calculate the next_key.

@skyzh skyzh changed the title State Async Flush part1: refactor hummock/mview iterator and make all read interfaces epoch-aware refactor(storage): make iterator interfaces epoch-aware Feb 7, 2022
@hzxa21 hzxa21 force-pushed the patrick/async-flush-part1 branch from 782fa8d to 57c4c25 Compare February 7, 2022 14:51
Copy link
Contributor

@MrCroxx MrCroxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSTM

@hzxa21 hzxa21 force-pushed the patrick/async-flush-part1 branch from 77cbdb8 to 35488d4 Compare February 9, 2022 05:51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this? When there is no data from state store, it will return EOF by nature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. TableIter only has next method, no valid method so it relies on calling next to know whether there is any data (Ok((None) means EOF)).
  2. We don't want unnecessary seek to happen when there is no more data or there is a non-retriable error

@hzxa21 hzxa21 force-pushed the patrick/async-flush-part1 branch from f35eb76 to 09acbfd Compare February 9, 2022 16:18
@hzxa21 hzxa21 merged commit 8605718 into main Feb 9, 2022
@hzxa21 hzxa21 deleted the patrick/async-flush-part1 branch February 9, 2022 16:48
type Iter<'a> = MemoryStateStoreIter;

async fn get(&self, key: &[u8]) -> Result<Option<Bytes>> {
async fn get(&self, key: &[u8], _epoch: u64) -> Result<Option<Bytes>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InMemory engine should be epoch-aware. Otherwise it will be hard to write unit tests for checkpointing logic -- it's impossible to start a MinIO instance when running unit tests.

if self.buf.is_empty() {
self.buf = self.keyspace.scan(Some(limit), self.epoch).await?
} else {
let next_key = next_key(self.buf.last().unwrap().0.to_vec().as_slice());
Copy link
Contributor

@skyzh skyzh Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g., we have

  • a
  • ab
  • bc

the last key we get is a. next_key(a) = b. We will miss ab.

#[allow(dead_code)]
pub async fn iterate_store(&self) -> Result<Vec<(A::OwnedItem, ExtremePk)>> {
let all_data = self.keyspace.scan_strip_prefix(None).await?;
let epoch = u64::MAX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've disabled two tests in previous PRs. May need to fix them by removing #[ignore].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants