Skip to content

Conversation

@BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Feb 15, 2022

What's changed and what's your intention?

This PR fixes some issues and makes distributed table_v2 insertion & scanning work!

Insertion

  • Currently row_id is generated by local source on each compute node. Since we are storing data of table_v2 on a shared storage, row_id is prepended a worker_id now for uniqueness.
  • cc @wyhyhyhyh

Scanning

  • Previously, Bummock table data is stored on each compute-node in the shared-nothing manner. Thus, distributed scanning is simply full scans on all compute nodes.
  • In the future, we will implement partition(range)-read on shared storage (Hummock). So current distributed batch plan seems still okay.
  • In this PR, we implement a fake partition-read, that is, only the first compute-node will access the storage and yield all chunks.
  • cc @TennyZhuang

Storage

  • We have not implemented periodical version syncing for Hummock's local version manager. After updates to the storage by one compute node, the other nodes cannot be aware of this latest version, which leads to a wrong scanning result.
  • In this PR, we extract a function named update_local_version and expose it within StateStore trait. Before creating a MViewTableIter, this function is manually called to fetch the latest version.
  • Note that this is only a workaround and there might be some concurrency races. We should remove this after Hummock is refined.
  • cc @zwang28 @hzxa21

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

Close #311.

@codecov
Copy link

codecov bot commented Feb 15, 2022

Codecov Report

Merging #334 (6c644f1) into main (c5c82e6) will decrease coverage by 0.00%.
The diff coverage is 68.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #334      +/-   ##
============================================
- Coverage     74.18%   74.17%   -0.01%     
  Complexity     2679     2679              
============================================
  Files           861      862       +1     
  Lines         48666    48707      +41     
  Branches       1591     1591              
============================================
+ Hits          36104    36130      +26     
- Misses        11749    11764      +15     
  Partials        813      813              
Flag Coverage Δ
java 61.97% <ø> (ø)
rust 79.43% <68.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/server/tests/row_seq_scan.rs 85.96% <ø> (ø)
rust/server/tests/table_v2_materialize.rs 95.23% <ø> (ø)
rust/storage/src/bummock/table.rs 80.53% <0.00%> (-0.72%) ⬇️
rust/storage/src/hummock/mod.rs 39.64% <0.00%> (-0.72%) ⬇️
rust/storage/src/hummock/state_store.rs 0.00% <0.00%> (ø)
rust/storage/src/table/mod.rs 80.55% <ø> (ø)
rust/stream/src/task/env.rs 27.77% <0.00%> (-7.01%) ⬇️
rust/batch/src/executor/row_seq_scan.rs 55.00% <50.00%> (-1.67%) ⬇️
rust/storage/src/table/mview.rs 69.16% <66.66%> (-0.67%) ⬇️
rust/common/src/worker_id.rs 75.00% <75.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5c82e6...6c644f1. Read the comment docs.

@BugenZhao BugenZhao changed the title fix(batch): distributed table_v2 insertion & scanning feat(batch): distributed table_v2 insertion & scanning Feb 15, 2022
@BugenZhao BugenZhao changed the title feat(batch): distributed table_v2 insertion & scanning feat(batch): distributed table_v2 insertion & scanning on Hummock Feb 15, 2022
@BugenZhao BugenZhao changed the title feat(batch): distributed table_v2 insertion & scanning on Hummock feat(batch): distributed table_v2 insertion & scanning for Hummock Feb 15, 2022
@BugenZhao BugenZhao changed the title feat(batch): distributed table_v2 insertion & scanning for Hummock feat(batch): distributed table_v2 insertion & scanning Feb 15, 2022
@github-actions github-actions bot added the type/feature Type: New feature. label Feb 15, 2022
@BugenZhao BugenZhao marked this pull request as ready for review February 15, 2022 09:19
Copy link
Collaborator

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

exporter-port: 1224
- use: frontend
- use: minio

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minio should be started at first.

@wyhyhyhyh
Copy link
Contributor

Will tuple id be duplicated when two materialized view nodes are placed on the same worker?

@wyhyhyhyh
Copy link
Contributor

LGTM in general.
I am just not sure if using worker id as prefix will work.

@BugenZhao
Copy link
Member Author

Will tuple id be duplicated when two materialized view nodes are placed on the same worker?

There is only one SourceImpl in memory :).

@BugenZhao BugenZhao merged commit 5facbfd into main Feb 15, 2022
@BugenZhao BugenZhao deleted the bz/distributed-insert-row-id branch February 15, 2022 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature Type: New feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

batch: 3-node with table-v2

5 participants