fix(connector): fix start_offset and empty range #7112

ZENOTME · 2022-12-29T07:03:04Z

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

fix:

process start offset correctly
check empty range correctly

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

Installation and deployment
Connector (sources & sinks)
SQL commands, functions, and operators
RisingWave cluster configuration changes
Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

…orrectly.

tabVersion · 2022-12-29T07:19:41Z

src/connector/src/source/kafka/source/reader.rs

+            if let Some(start_offset) = self.start_offset && start_offset == stop_offset {
+                yield Vec::new();
+                return Ok(());
+            } else if stop_offset == 0 {
+                yield Vec::new();
+                return Ok(());
+            }


it returns nothing. Consider refusing to create a reader in such case.

If we refuse to create a reader and return a error, there are two option:

We directly report error if the range return nothing. This behavior seems weird because we will not report error if a table is empty.

We compare the error and transfer it from bottom to top. Because SplitReaderImpl::create return as a AnyError, we compare the error by comparing string, I'm not sure whether is a good way.

We directly report error if the range return nothing.

I vote for this solution. Table is an internal concept, which is easy to handle but querying from kafka has more overhead than from table. If we know the result is empty deterministically, why do we waste resources on it?
@liurenjie1024 @fuyufjh what do you think?

We will create batch source executor each split which means that we can't create source executor if all split is empty and they both be filtered. So for now we directly return error in this case.
This behavior maybe cause the problem: If we left join a table and a empty source, e.g.select * from table left join empty_source ,directly return error will cause error return. But this scenario is rare I think...

If we want the user to see the empty result instead of a error, maybe we can:

create a empty batch vlaue executor

@liurenjie1024 @fuyufjh

I don't think we should return error here and it feels like abusing of error reporting to me since empty result is common in query. I think just return empty vec is good enough.

codecov · 2022-12-29T07:29:38Z

Codecov Report

Merging #7112 (1954dc4) into main (7de49c0) will decrease coverage by 0.00%.
The diff coverage is 18.18%.

❗ Current head 1954dc4 differs from pull request most recent head aac5f1b. Consider uploading reports for the commit aac5f1b to get more accurate results

@@            Coverage Diff             @@
##             main    #7112      +/-   ##
==========================================
- Coverage   73.15%   73.15%   -0.01%     
==========================================
  Files        1052     1052              
  Lines      167399   167424      +25     
==========================================
+ Hits       122467   122477      +10     
- Misses      44932    44947      +15

Flag	Coverage Δ
rust	`73.15% <18.18%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/common/src/error.rs	`70.55% <ø> (ø)`
src/connector/src/error.rs	`0.00% <ø> (ø)`
...rc/connector/src/source/kafka/enumerator/client.rs	`0.00% <0.00%> (ø)`
src/connector/src/source/kafka/source/reader.rs	`0.00% <0.00%> (ø)`
src/source/src/connector_source.rs	`69.11% <42.85%> (-1.34%)`	⬇️
src/meta/src/hummock/mock_hummock_meta_client.rs	`64.21% <0.00%> (-1.06%)`	⬇️
src/common/src/types/ordered_float.rs	`31.05% <0.00%> (-0.20%)`	⬇️
src/storage/src/hummock/compactor/mod.rs	`83.10% <0.00%> (-0.16%)`	⬇️
src/meta/src/hummock/manager/mod.rs	`79.37% <0.00%> (-0.12%)`	⬇️
src/common/src/cache.rs	`97.54% <0.00%> (+0.22%)`	⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ZENOTME · 2022-12-30T08:28:58Z

This PR is included in #7150

github-actions bot added the type/fix Type: Bug fix. Only for pull requests. label Dec 29, 2022

ZENOTME requested review from shanicky, tabVersion and wangrunji0408 December 29, 2022 07:04

fix to read from source reader according (start_offset,stop_offset) c…

aac5f1b

…orrectly.

ZENOTME force-pushed the zj/fix_reader branch from d78b9ee to aac5f1b Compare December 29, 2022 07:08

tabVersion reviewed Dec 29, 2022

View reviewed changes

ZENOTME force-pushed the zj/fix_reader branch 3 times, most recently from 930428c to aac5f1b Compare December 30, 2022 07:00

ZENOTME requested a review from liurenjie1024 December 30, 2022 07:04

ZENOTME closed this Dec 30, 2022

ZENOTME mentioned this pull request Jan 3, 2023

feat: Initial support for kafka timestamp pushdown. #7150

Merged

3 tasks

ZENOTME deleted the zj/fix_reader branch January 4, 2023 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(connector): fix start_offset and empty range #7112

fix(connector): fix start_offset and empty range #7112

Uh oh!

ZENOTME commented Dec 29, 2022 •

edited

Loading

Uh oh!

tabVersion Dec 29, 2022

Uh oh!

ZENOTME Dec 29, 2022

Uh oh!

tabVersion Dec 29, 2022

Uh oh!

ZENOTME Dec 30, 2022 •

edited

Loading

Uh oh!

liurenjie1024 Dec 30, 2022

Uh oh!

codecov bot commented Dec 29, 2022 •

edited

Loading

Uh oh!

ZENOTME commented Dec 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(connector): fix start_offset and empty range #7112

fix(connector): fix start_offset and empty range #7112

Uh oh!

Conversation

ZENOTME commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed and what's your intention?

Checklist

Documentation

Types of user-facing changes

Release note

Refer to a related PR or issue link (optional)

Uh oh!

tabVersion Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

ZENOTME Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

tabVersion Dec 29, 2022

Choose a reason for hiding this comment

Uh oh!

ZENOTME Dec 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Dec 30, 2022

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ZENOTME commented Dec 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZENOTME commented Dec 29, 2022 •

edited

Loading

ZENOTME Dec 30, 2022 •

edited

Loading

codecov bot commented Dec 29, 2022 •

edited

Loading