[rfc] design exploration: implementations for operators against NestedIterator #2103

tantaman · 2024-08-01T02:22:35Z

This should be the final form (in terms of types) of NestedIterable. The operators (join/map/filter) are sketches to confirm the shape of NestedIterable is what we need. NestedIterable is an Iterable of Entry.

const event = Symbol();
const node = Symbol();
type Add = typeof ADD;
type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Event = Add | Remove | NoOp;
type Entry<Type = JSONObject> = {
  [node]: Type;
  [event]: Event;
  [children: string]: Iterable<Entry>;
};
type NestedIterable = Iterable<Entry>;

I've explored:

join
topk
filter
map

As well as:

forking (for or)
merging
merge-distinct (for or)

The latter set uses restartable so many iterators can be gathered from the base Iterable, allowing all forks to get the same data.

Related document: https://www.notion.so/replicache/NestedIterable-5123f11b877e41b7bc9f00486d491d8b#adb39380a6f74402ace02cb85fe0c405

There are some possible future explorations:

join when dealing with deltas and many:1 & many:many relationships
Passing sort information to topk along with the iterable

But it seems like diminishing returns at this point.

vercel · 2024-08-01T02:22:38Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
replicache-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 1, 2024 5:16pm
zeppliear	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 1, 2024 5:16pm

and maybe this obviates `genCached` too?

…tinct`

aboodman · 2024-08-02T00:46:08Z

packages/zql/src/iterable-explore.test.ts

+const event = Symbol();
+const node = Symbol();
+type Event = Add | Remove | NoOp;
+const ADD = 1;


What if we just give them string values so we can catch/find if there is any reason to do arithmetic. I don't think there is, but then we'll know for sure.

One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

aboodman · 2024-08-02T00:48:26Z

packages/zql/src/iterable-explore.test.ts

+import type {JSONObject} from '../../shared/src/json.js';
+
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+type TODO = any;


What is this?

a silly way to put cast things as any without needing an eslint comment each time 😅

aboodman · 2024-08-02T00:51:07Z

packages/zql/src/iterable-explore.test.ts

+type Remove = typeof REMOVE;
+type NoOp = typeof NO_OP;
+type Entry<Type = JSONObject> = {
+  [node]: Type;


Why does the parent row get referred to by this symbol, but child iterators get actual names based on their table?

mergeDistinct needed a way to find and id for a row. Giving the parent row a stable name made that easier to do. Otherwise we'd need to iterate all the keys in the row and find the thing that isn't iterable.

aboodman · 2024-08-02T00:52:48Z

packages/zql/src/iterable-explore.test.ts

+type Remove = typeof REMOVE;
+type NoOp = typeof NO_OP;
+type Entry<Type = JSONObject> = {
+  [node]: Type;


If we're going to use a symbol here let's try to pick a unique name. How about Entity that is the name we used before. Or Row?

Also is it enforced that every value flowing through the pipeline has a unique ID as it does today?

every row will have a primary key and we can't mix rows from different tables in the same iterable level so I think this is true.

Sorry I pressed publish on this review too quickly and my comments were kind of cryptic. Here's what I should have said:

It's better if we pick names for things in the system that are unique and not already used by other concepts. The name "node" is already often used to refer to pipeline nodes. So what about using a different name like "row" or "entity"? The reason this is important is so that when talking about the system we can just use a short name for a concept rather than having to qualify it. Like we can say "entity" rather than "pipeline entry node".

In the existing ivm, we require all things flowing through the pipeline to have a unique ID. Are we going to in the new system too? If we are then how come there's no id field here? is it just because this is a sketch and the id wasn't required, or is its absence important somehow? If there's going to be an ID for each of the thingies flowing through the pipeline, then the name "entity" or "PipelineEntity" makes even more sense.

Sorry for obsessing so much about names, but I think a big part of system design is just choosing good names.

I've renamed things in the PR that is meant to be merged into main:

https://github.com/rocicorp/mono/pull/2109/files : packages/zql/src/zql/ivm-2/iterable-tree.ts

I've also gone back to requiring all Entity types to have an id field. We'll need to revisit this when we add compound primary keys which we've discussed before.

tantaman added 3 commits July 31, 2024 18:07

sketch out nested iterable operator implementations

f4c3769

experiment with restartable IterableIterators

689ebd3

join exploration

afc835b

vercel bot deployed to Preview – zeppliear August 1, 2024 02:22 View deployment

tantaman added 2 commits August 1, 2024 11:24

topk exploration and another join test

548f1b4

fork and merge exploration -- building up to no more tx-distinct

0c78c82

and maybe this obviates `genCached` too?

vercel bot deployed to Preview – replicache-docs August 1, 2024 15:57 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 16:00 View deployment

remove broken restartable implementation

0e379a9

tantaman force-pushed the mlaw/iterable-explore branch from 8177b75 to 0e379a9 Compare August 1, 2024 16:04

vercel bot deployed to Preview – replicache-docs August 1, 2024 16:07 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 16:07 View deployment

stable name for the parent row so we can extract the id for `mergeDis…

b7f3d73

…tinct`

vercel bot deployed to Preview – replicache-docs August 1, 2024 17:03 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 17:04 View deployment

distinct merge to replcae tx-distinct for or paths

ad56a6c

tantaman force-pushed the mlaw/iterable-explore branch from f2c2781 to ad56a6c Compare August 1, 2024 17:12

tantaman changed the title ~~explore implementations for operators against NestedIterator~~ design exploration: implementations for operators against NestedIterator Aug 1, 2024

vercel bot deployed to Preview – replicache-docs August 1, 2024 17:15 View deployment

tantaman changed the title ~~design exploration: implementations for operators against NestedIterator~~ [rfc] design exploration: implementations for operators against NestedIterator Aug 1, 2024

vercel bot deployed to Preview – zeppliear August 1, 2024 17:16 View deployment

tantaman marked this pull request as ready for review August 1, 2024 17:19

tantaman requested review from aboodman and arv August 1, 2024 17:24

aboodman reviewed Aug 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rfc] design exploration: implementations for operators against NestedIterator #2103

[rfc] design exploration: implementations for operators against NestedIterator #2103

Uh oh!

tantaman commented Aug 1, 2024 •

edited

Loading

Uh oh!

vercel bot commented Aug 1, 2024 •

edited

Loading

Uh oh!

aboodman Aug 2, 2024

Uh oh!

arv Aug 2, 2024

Uh oh!

aboodman Aug 2, 2024

Uh oh!

tantaman Aug 2, 2024

Uh oh!

aboodman Aug 2, 2024

Uh oh!

tantaman Aug 2, 2024

Uh oh!

aboodman Aug 2, 2024

Uh oh!

aboodman Aug 2, 2024

Uh oh!

tantaman Aug 2, 2024

Uh oh!

aboodman Aug 2, 2024

Uh oh!

tantaman Aug 2, 2024

[rfc] design exploration: implementations for operators against NestedIterator #2103

[rfc] design exploration: implementations for operators against NestedIterator #2103

Uh oh!

Conversation

tantaman commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tantaman commented Aug 1, 2024 •

edited

Loading

vercel bot commented Aug 1, 2024 •

edited

Loading