Skip to content

Conversation

@chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Nov 16, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • Support UNION ALL for streaming queries.
  • The most critical part of streaming union is that we need to rewrite all the inputs of the union operator to contain pk while keeping all the input schema being the same.
  • We add a new column source_col for rewriting the union operator to identify the record that came from the input source and make it a part of the union pks.
  • This PR only supports UNION ALL, but not UNION. I think all the UNION should be transformed into UNION ALL by optimizer in the later PR to avoid the handcrafted streaming rewriting.
  • This PR we enforce all the inputs of union with single distribution. I think we can optimize it with other distributions (like round robin) in the future.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • SQL operators

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

#6392

@chenzl25 chenzl25 added user-facing-changes Contains changes that are visible to users type/feature Type: New feature. labels Nov 16, 2022
@codecov
Copy link

codecov bot commented Nov 16, 2022

Codecov Report

Merging #6397 (85a8724) into main (9e9e905) will decrease coverage by 0.04%.
The diff coverage is 69.62%.

@@            Coverage Diff             @@
##             main    #6397      +/-   ##
==========================================
- Coverage   74.05%   74.00%   -0.05%     
==========================================
  Files         972      975       +3     
  Lines      158148   158684     +536     
==========================================
+ Hits       117115   117436     +321     
- Misses      41033    41248     +215     
Flag Coverage Δ
rust 74.00% <69.62%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/compute/src/server.rs 0.00% <0.00%> (ø)
src/frontend/src/handler/create_source.rs 66.25% <0.00%> (-3.75%) ⬇️
src/frontend/src/optimizer/plan_node/mod.rs 85.49% <ø> (ø)
src/source/src/lib.rs 90.00% <ø> (ø)
src/source/src/manager.rs 79.38% <0.00%> (-0.25%) ⬇️
src/source/src/parser/avro/parser.rs 74.73% <ø> (ø)
src/source/src/parser/avro/schema_resolver.rs 4.05% <ø> (ø)
src/source/src/parser/common.rs 63.63% <ø> (ø)
src/source/src/parser/mod.rs 80.14% <0.00%> (-1.16%) ⬇️
src/source/src/parser/protobuf/parser.rs 54.16% <ø> (ø)
... and 29 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more


impl<PlanRef: GenericPlanRef> GenericPlanNode for Union<PlanRef> {
fn schema(&self) -> Schema {
self.inputs[0].schema().clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the hidden column here?

Copy link
Contributor Author

@chenzl25 chenzl25 Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. The input schema already contains the hidden column and we can find them by source_col.

Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. We can find a better way when we move the pk rewriting only on stream plan 🥵

@mergify mergify bot merged commit 5a89472 into main Nov 18, 2022
@mergify mergify bot deleted the dylan/support_streaming_union branch November 18, 2022 05:00
xxchan pushed a commit that referenced this pull request Nov 19, 2022
* support union all for streaming query

* fmt

* add stream_dist_plan for union

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature Type: New feature. user-facing-changes Contains changes that are visible to users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants