feat: support catalog update #567

HuaHuaY · 2022-02-25T18:03:53Z

What's changed and what's your intention?

Support catalog update by NotificationService.

Note

There is a lock competition problem here. When we create a database and access it immediately, the code of ObserverManager update and the code of immediate access will compete for the lock on CatalogCache at the same time.

In the existing code, it may cause the frontend to fail to start, reporting an error that a database named "dev" does not exist.. This is because there is a continuous creation of a database named "dev" and a schema under this database in the frontend (Although we will move these creation to meta later, it can reflect this competition for the lock).

I put the code for getting the lock in the ObserverManager earlier. But it doesn't always work.

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

related #361

codecov · 2022-02-25T18:16:53Z

Codecov Report

Merging #567 (bc5962a) into main (9ddfe63) will increase coverage by 0.27%.
The diff coverage is 82.94%.

@@             Coverage Diff              @@
##               main     #567      +/-   ##
============================================
+ Coverage     71.28%   71.56%   +0.27%     
  Complexity     2706     2706              
============================================
  Files           895      895              
  Lines         51238    51538     +300     
  Branches       1730     1730              
============================================
+ Hits          36526    36881     +355     
+ Misses        13897    13842      -55     
  Partials        815      815

Flag	Coverage Δ
java	`59.86% <ø> (ø)`
rust	`76.56% <82.94%> (+0.35%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
rust/meta/src/cluster/mod.rs	`77.88% <ø> (+18.26%)`	⬆️
rust/rpc_client/src/meta_client.rs	`57.38% <70.83%> (+16.86%)`	⬆️
rust/frontend/src/catalog/catalog_service.rs	`86.82% <71.69%> (-11.01%)`	⬇️
rust/frontend/src/observer/observer_manager.rs	`85.18% <82.60%> (+85.18%)`	⬆️
rust/frontend/src/test_utils.rs	`85.18% <84.21%> (+5.18%)`	⬆️
rust/meta/src/manager/catalog.rs	`75.62% <88.46%> (+3.97%)`	⬆️
rust/frontend/src/catalog/database_catalog.rs	`92.30% <90.90%> (-1.81%)`	⬇️
rust/frontend/src/scheduler/schedule.rs	`89.84% <100.00%> (+7.92%)`	⬆️
rust/frontend/src/session.rs	`98.43% <100.00%> (+4.68%)`	⬆️
rust/meta/src/rpc/server.rs	`91.17% <100.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ddfe63...bc5962a. Read the comment docs.

rust/frontend/src/catalog/catalog_service.rs

fuyufjh

Generally LGTM

fuyufjh · 2022-02-28T05:12:52Z

rust/frontend/src/observer/observer_manager.rs

+    /// `update_database` is called in `start` method.
+    /// It calls `create_database` and `drop_database` of `CatalogCache`.
+    fn update_database(&self, operation: Operation, database: Database) {
+        debug!(


Recommend using info!

fuyufjh · 2022-02-28T05:15:28Z

rust/frontend/src/observer/observer_manager.rs

+            Operation::Delete => catalog_cache_guard
+                .drop_table(&db_name, &schema_name, &table.table_name)
+                .unwrap(),


So even for Delete operation, the meta server needs to send the full description of the object instead of the name only ?

So even for Delete operation, the meta server needs to send the full description of the object instead of the name only ?

Actually sending a object is also not easy for create/drop because we should use ***_ref_id to find the name of database or schema. It can just simplify proto file.

yezizp2012 · 2022-02-28T05:41:25Z

rust/frontend/src/session.rs

+
+        let mut observer_manager = ObserverManager::new(meta_client.clone(), host).await;
+
+        let catalog_cache = Arc::new(Mutex::new(CatalogCache::new()));


Should fetch catalog from meta firstly to init CatalogCache, and if dev database/schema already exist, ignore creating them.

Should fetch catalog from meta firstly to init CatalogCache, and if dev database/schema already exist, ignore creating them.

I consider to init dev database/schema when meta server starts. I plan to move init code to meta later.

Yes, but for dev and other catalog already exist in our cluster, there's no create operation when meta start or frontend reboot.

Yes. We should call GetCatalog rpc when frontend starts.

BugenZhao · 2022-02-28T05:30:39Z

rust/frontend/src/catalog/catalog_service.rs


-    fn create_schema(
+    pub fn get_database_name(&self, db_id: DatabaseId) -> Option<String> {
+        Some(self.db_name_by_id.get(&db_id)?.clone())


Suggested change

Some(self.db_name_by_id.get(&db_id)?.clone())

self.db_name_by_id.get(&db_id).cloned()

Suggested change
Some(self.db_name_by_id.get(&db_id)?.clone())
self.db_name_by_id.get(&db_id).cloned()

self.db_name_by_id.get(&db_id).cloned() will create a Option<&String>, &String requires the lifetime of &self is longer than itself. Then if &String exists, we can't get a &mut self to update schema.

It's cloned instead of clone.

It's cloned instead of clone.

Sorry, it's my fault. I will change it, thanks.

BugenZhao · 2022-02-28T05:37:23Z

rust/frontend/src/session.rs

+        let catalog_cache = Arc::new(Mutex::new(CatalogCache::new()));
+        let catalog_manager = CatalogConnector::new(meta_client.clone(), catalog_cache.clone());
+
+        observer_manager.set_catalog_cache(catalog_cache);


Can we directly pass this catalog_cache to ObserveManager::new? Ditto for WorkerNodeManager.

Can we directly pass this catalog_cache to ObserveManager::new? Ditto for WorkerNodeManager.

I consider that it is not easy to collect all members of ObserverManager. May be we will transfer the quote of ObserverManager to set some member in the future. We can change this code in the future.

Why? There must be some way to initialize these managers one by one.

Ok, I will update the code in the next commit.

BugenZhao · 2022-02-28T05:41:20Z

rust/frontend/src/observer/observer_manager.rs

            .worker_node_manager
            .as_ref()
-            .expect("forget to call set_worker_node_manager before call start");
+            .expect("forget to call `set_worker_node_manager` before call start");


Seems set_worker_node_manager is unused now?

Seems set_worker_node_manager is unused now?

Yes. It seems that BatchScheduler is not used, which contains WorkerNodeManager.

BowenXiao1999

Generally LGTM. So one frontend node will have exactly one observe mgr?

BTW, suggest add a Tracking issue for this project, to make the roadmap more clear.

HuaHuaY · 2022-02-28T07:12:01Z

Generally LGTM. So one frontend node will have exactly one observe mgr?

BTW, suggest add a Tracking issue for this project, to make the roadmap more clear.

Yes. Now there is only one ObserverManager in one frontend node.

HuaHuaY · 2022-02-28T19:40:56Z

I add a member state: CatalogState of Database/Schema/Table, which represents the state of the current catalog, whether it's just created and not synced, or synced.

For example, when frontend creates a database, it calls Create RPC first. Then it will add a database with state CatalogState::Standalone to CatalogCache. At the same time, ObserverManager tries to add a database with state CatalogState::Synchronous to CatalogCache. Both of them call CatalogCache::create_database.
If create_database is called with Standalone first, Synchronous second, it just changes the state of database from Standalone to Synchronous. If Synchronous first, Standalone second, it does nothing.

And there is a problem I will solve it in the future. This state is only designed for the competition, but not duplicate data. If we create a database in the frontend, and finish sync, the state of database is Synchronous. Then we create the same database in the frontend again and create_database is called with Standalone first, it will do nothing and not give an error. I will add duplicate check in meta in the future.

BowenXiao1999 · 2022-03-01T03:00:21Z

I'm thinking whether the state is necessary. I remeber that in previous code, create table will just call RPC and do nothing in cache (?), waiting for ObserverManager to update local database (?). Any reason to introduce this?

HuaHuaY · 2022-03-01T03:10:37Z

I'm thinking whether the state is necessary. I remeber that in previous code, create table will just call RPC and do nothing in cache (?), waiting for ObserverManager to update local database (?). Any reason to introduce this?

In the first commit, I deleted the operation which changes CatalogCache. There is a problem that if I create a table, then read it, because Create RPC will return when meta sends notification successfully but not frontend has read notification, the created database may not be updated by ObserverManager in time. I will get a "database not found". So in the second commit, I added back the code that modifies the CatalogCache. Then there were two piece of code that create the same database, I should solve the duplicate database problem.

BowenXiao1999 · 2022-03-01T06:31:13Z

Or may be meta should wait until all observer managers successfully update? Not sure

HuaHuaY · 2022-03-01T06:48:57Z

Or may be meta should wait until all observer managers successfully update? Not sure

It's a good idea and it can work. It just requires frontend to design another RPC message sent to meta.

I discuss with @yezizp2012 . We decide to use node_id of CreateRequest to filter the node who sent the message when meta sends notifications. Maybe I should change members of NotificationManager from DashMap<WorkerKey, Sender<Notification>> into DashMap<WorkerId, Sender<Notification>>, change design of SubscribeRequest and save the return value of AddWorkerNode RPC and so on.

HuaHuaY · 2022-03-02T02:53:41Z

After discussions at yesterday's meeting, we decided to go with this option:
When CatalogConnector calls create_database, it first sends a Create RPC to meta. Then it calls catalog_cache.get_database in a loop until the return value is not None. It waits for ObserverManager to create database.

yezizp2012 · 2022-03-02T03:11:31Z

rust/frontend/src/observer/observer_manager.rs

+    pub async fn new(
+        client: MetaClient,
+        addr: SocketAddr,
+        catalog_cache: Arc<Mutex<CatalogCache>>,


Missing worker node manager parameter.

BugenZhao · 2022-03-02T02:59:01Z

rust/frontend/src/catalog/catalog_service.rs

+        while self
+            .catalog_cache
+            .lock()
+            .get_table(db_name, schema_name, &table.table_name)


It still looks strange for me to identifying table by its name :(, seems there will be possible race cases.
Considering the create_table RPC returns a TableId, how about waiting for the occurrence of this id? Note that the id is always guaranteed to be unique.

It still looks strange for me to identifying table by its name :(, seems there will be possible race cases.
Considering the create_table RPC returns a TableId, how about waiting for the occurrence of this id? Note that the id is always guaranteed to be unique.

When will there be race cases? If two frontends create a same table in same schema and database, I think we should reject it in meta, although this part has not been implemented.

I'm not sure about this🤣. However I believe looking for table id will be more robust.

BugenZhao · 2022-03-02T03:13:25Z

rust/frontend/src/observer/observer_manager.rs

    rx: Streaming<SubscribeResponse>,
-    worker_node_manager: Option<WorkerNodeManagerRef>,
+    worker_node_manager: Option<WorkerNodeManagerRef>, /* Option<> will be removed when `WorkerNodeManager` is used in code. */
+    catalog_cache: Arc<Mutex<CatalogCache>>,


Suggested change

catalog_cache: Arc<Mutex<CatalogCache>>,

catalog_cache: Arc<RwLock<CatalogCache>>,

Suggest using RwLock here.

Will reading the catalog be more frequent than writing the catalog?

For example, when we're waiting for the table to be created, we'll frequently read the catalog.

BugenZhao · 2022-03-02T03:23:00Z

rust/meta/src/manager/catalog.rs

+                    &Info::Database(database),
+                    NotificationTarget::Frontend,
+                )
+                .await?;


What's the plan for handling the notifying error? Seems notify is a try_join_all for notifications to all nodes, and an error on one of them will abort the whole process, which will lead to an inconsistent situation.
Should there be a way to ignore notification error on some frontends, or a way to record notifying process for each node with corresponding retry strategy?

Or just leave a TODO here.

Ok, I will add a TODO here.

It should be handled when error occurred, as i list in the error handling part in #361. For this PR, you can leave a TODO here.

BugenZhao · 2022-03-02T03:26:36Z

rust/frontend/src/catalog/catalog_service.rs

+            .is_none()
+        {
+            info!("Wait to create table: {}", table.table_name);
+            time::sleep(Duration::from_micros(10)).await;


How about use watch channel for this case? A reference might be:
https://github.com/singularity-data/risingwave-dev/blob/3725111b990b99e5dbc64b43a75e1cc779b33849/rust/storage/src/hummock/local_version_manager.rs#L151-L172

Suppose that there are two frontends A and B. If A creates a database, not only A's but also B's CatalogCache will be update by ObserverManager. Now ObserverManager doesn't know that the notification from meta is who creates. I think if B's ObserverManager also sends a message in B's channel, it's wrong.

Just wrap changed in the loop as well. :)

BugenZhao · 2022-03-03T02:57:14Z

rust/frontend/src/catalog/catalog_service.rs

+            self.catalog_updated_rx
+                .to_owned()


Should clone this rx before the RPC from meta_client since the data before cloning is always considered to be seen. We may lose notification in this way.

Is there document about this? Or I misunderstand the code. I I browse the code of its Clone implement, it just copys the value of version and pointer of the shared state.

Is there document about this? Or I misunderstand the code. I I browse the code of its Clone implement, it just copys the value of version and pointer of the shared state.

Can't clone in while loop. If there is an unmarked message before clone, program will be stuck here.

fuyufjh · 2022-03-03T03:09:15Z

rust/frontend/src/catalog/catalog_service.rs

+                .to_owned()
+                .changed()
+                .await
+                .map_err(|e| RwError::from(InternalError(e.to_string())))?;


The while loop may still miss some updates in some concurrent scenarios, for example

Frontend A creates a database D

Frontend A got updates from catalog_updated_rx

Someone else drops database D immediately

Frontend A continues from step 2 and check catalog_cache but didn't found D

Frontend A hangs forever

Since almost all DDL statements need to wait for catalog updates, I think we need some mechanism to let the frontend wait for a specific request_id or catalog_version or catalog_epoch, which would be piggy-backed in the Notification RPC to allow frontend DDL executor to wait for it.

This is not very easy. We may leave a TODO here and fix it later.

HuaHuaY · 2022-03-03T09:08:18Z

Now in unit test, we will not use RPC. However, the type of return value of subscribe RPC in the meta client is tonic::Streaming<>, which is changed from ReceiverStream by tonic.

async fn subscribe(&self, request: SubscribeRequest) -> Result<Streaming<SubscribeResponse>> { ... }

It seems that we have no way to manually convert ReceiverStream to tonic::Streaming<>. I create a trait NotificationStream which has only one method next, used in the return value of subscribe. And I implement this trait for tonic::Streaming<> and ReceiverStream` respectively.

async fn subscribe(&self, request: SubscribeRequest) -> Result<Box<dyn NotificationStream>> { ... }

#[async_trait::async_trait]
pub trait NotificationStream: Send {
    /// Ok(Some) => receive a `SubscribeResponse`.
    /// Ok(None) => stream terminates.
    /// Err => error happens.
    async fn next(&mut self) -> Result<Option<SubscribeResponse>>;
}

#[async_trait::async_trait]
impl NotificationStream for Streaming<SubscribeResponse> { ... }

#[async_trait::async_trait]
impl NotificationStream for Receiver<std::result::Result<SubscribeResponse, Status>> { ... }

support catalog update

cc5165d

HuaHuaY requested review from fuyufjh, skyzh and yezizp2012 February 25, 2022 18:04

github-actions bot added the type/feature Type: New feature. label Feb 25, 2022

yezizp2012 reviewed Feb 28, 2022

View reviewed changes

rust/frontend/src/catalog/catalog_service.rs Outdated Show resolved Hide resolved

yezizp2012 requested a review from BowenXiao1999 February 28, 2022 04:13

fuyufjh approved these changes Feb 28, 2022

View reviewed changes

yezizp2012 reviewed Feb 28, 2022

View reviewed changes

BugenZhao reviewed Feb 28, 2022

View reviewed changes

BowenXiao1999 approved these changes Feb 28, 2022

View reviewed changes

HuaHuaY added 2 commits March 2, 2022 01:04

fix catalog not found

10d5622

Merge branch 'main' into zehua/support_catalog_update

3b780c7

HuaHuaY force-pushed the zehua/support_catalog_update branch from 6cc324e to 3b780c7 Compare March 2, 2022 02:48

yezizp2012 reviewed Mar 2, 2022

View reviewed changes

BugenZhao reviewed Mar 2, 2022

View reviewed changes

HuaHuaY added 3 commits March 2, 2022 17:49

call getCatalog in CatalogCache::new

77decbc

Add some TODO

166bdea

Use watch::channel instead of sleep

16798f5

HuaHuaY force-pushed the zehua/support_catalog_update branch from c3f0c8d to 16798f5 Compare March 3, 2022 02:38

BugenZhao reviewed Mar 3, 2022

View reviewed changes

fuyufjh reviewed Mar 3, 2022

View reviewed changes

yezizp2012 approved these changes Mar 3, 2022

View reviewed changes

BugenZhao mentioned this pull request Mar 3, 2022

meta: extract a generic Replicated State Machine for meta-local syncing #672

Closed

HuaHuaY added 2 commits March 3, 2022 16:53

Merge branch 'main' into zehua/support_catalog_update

d53d1b6

Merge branch 'main' into zehua/support_catalog_update

1d7aada

cargo sort

bc5962a

HuaHuaY merged commit 01615a2 into main Mar 3, 2022

HuaHuaY deleted the zehua/support_catalog_update branch March 3, 2022 09:36

HuaHuaY mentioned this pull request Mar 8, 2022

feat: Use version to synchronize catalog #749

Merged

2 tasks


		let mut observer_manager = ObserverManager::new(meta_client.clone(), host).await;

		let catalog_cache = Arc::new(Mutex::new(CatalogCache::new()));

	Some(self.db_name_by_id.get(&db_id)?.clone())
	self.db_name_by_id.get(&db_id).cloned()

	catalog_cache: Arc<Mutex<CatalogCache>>,
	catalog_cache: Arc<RwLock<CatalogCache>>,

feat: support catalog update #567

feat: support catalog update #567

Uh oh!

Conversation

HuaHuaY commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed and what's your intention?

Note

Checklist

Refer to a related PR or issue link (optional)

Uh oh!

codecov bot commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

fuyufjh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuaHuaY Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuaHuaY Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BugenZhao Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BowenXiao1999 left a comment

Choose a reason for hiding this comment

Uh oh!

HuaHuaY commented Feb 28, 2022

Uh oh!

HuaHuaY commented Feb 28, 2022

Uh oh!

BowenXiao1999 commented Mar 1, 2022

Uh oh!

HuaHuaY commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BowenXiao1999 commented Mar 1, 2022

Uh oh!

HuaHuaY commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuaHuaY commented Feb 25, 2022 •

edited

Loading

codecov bot commented Feb 25, 2022 •

edited

Loading

HuaHuaY Feb 28, 2022 •

edited

Loading

HuaHuaY Feb 28, 2022 •

edited

Loading

BugenZhao Feb 28, 2022 •

edited

Loading

HuaHuaY commented Mar 1, 2022 •

edited

Loading

HuaHuaY commented Mar 1, 2022 •

edited

Loading

HuaHuaY Mar 2, 2022 •

edited

Loading

BugenZhao Mar 2, 2022 •

edited

Loading