[rfc] design exploration: implementations for operators against NestedIterator #2103

tantaman · 2024-08-01T02:22:35Z

This should be the final form (in terms of types) of NestedIterable. The operators (join/map/filter) are sketches to confirm the shape of NestedIterable is what we need. NestedIterable is an Iterable of Entry.

const event = Symbol();
const node = Symbol();
type Add = typeof ADD;
type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Event = Add | Remove | NoOp;
type Entry<Type = JSONObject> = {
  [node]: Type;
  [event]: Event;
  [children: string]: Iterable<Entry>;
};
type NestedIterable = Iterable<Entry>;

I've explored:

join
topk
filter
map

As well as:

forking (for or)
merging
merge-distinct (for or)

The latter set uses restartable so many iterators can be gathered from the base Iterable, allowing all forks to get the same data.

Related document: https://www.notion.so/replicache/NestedIterable-5123f11b877e41b7bc9f00486d491d8b#adb39380a6f74402ace02cb85fe0c405

There are some possible future explorations:

join when dealing with deltas and many:1 & many:many relationships
Passing sort information to topk along with the iterable

But it seems like diminishing returns at this point.

vercel · 2024-08-01T02:22:38Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
replicache-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 1, 2024 5:16pm
zeppliear	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 1, 2024 5:16pm

and maybe this obviates `genCached` too?

…tinct`

aboodman · 2024-08-02T00:46:08Z

packages/zql/src/iterable-explore.test.ts

+const event = Symbol();
+const node = Symbol();
+type Event = Add | Remove | NoOp;
+const ADD = 1;


What if we just give them string values so we can catch/find if there is any reason to do arithmetic. I don't think there is, but then we'll know for sure.

One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

aboodman · 2024-08-02T00:48:26Z

packages/zql/src/iterable-explore.test.ts

+import type {JSONObject} from '../../shared/src/json.js';
+
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+type TODO = any;


What is this?

a silly way to put cast things as any without needing an eslint comment each time 😅

aboodman · 2024-08-02T00:51:07Z

packages/zql/src/iterable-explore.test.ts

+type Remove = typeof REMOVE;
+type NoOp = typeof NO_OP;
+type Entry<Type = JSONObject> = {
+  [node]: Type;


Why does the parent row get referred to by this symbol, but child iterators get actual names based on their table?

mergeDistinct needed a way to find and id for a row. Giving the parent row a stable name made that easier to do. Otherwise we'd need to iterate all the keys in the row and find the thing that isn't iterable.

aboodman · 2024-08-02T00:52:48Z

packages/zql/src/iterable-explore.test.ts

+type Remove = typeof REMOVE;
+type NoOp = typeof NO_OP;
+type Entry<Type = JSONObject> = {
+  [node]: Type;


If we're going to use a symbol here let's try to pick a unique name. How about Entity that is the name we used before. Or Row?

Also is it enforced that every value flowing through the pipeline has a unique ID as it does today?

every row will have a primary key and we can't mix rows from different tables in the same iterable level so I think this is true.

Sorry I pressed publish on this review too quickly and my comments were kind of cryptic. Here's what I should have said:

It's better if we pick names for things in the system that are unique and not already used by other concepts. The name "node" is already often used to refer to pipeline nodes. So what about using a different name like "row" or "entity"? The reason this is important is so that when talking about the system we can just use a short name for a concept rather than having to qualify it. Like we can say "entity" rather than "pipeline entry node".

In the existing ivm, we require all things flowing through the pipeline to have a unique ID. Are we going to in the new system too? If we are then how come there's no id field here? is it just because this is a sketch and the id wasn't required, or is its absence important somehow? If there's going to be an ID for each of the thingies flowing through the pipeline, then the name "entity" or "PipelineEntity" makes even more sense.

Sorry for obsessing so much about names, but I think a big part of system design is just choosing good names.

I've renamed things in the PR that is meant to be merged into main:

https://github.com/rocicorp/mono/pull/2109/files : packages/zql/src/zql/ivm-2/iterable-tree.ts

I've also gone back to requiring all Entity types to have an id field. We'll need to revisit this when we add compound primary keys which we've discussed before.

aboodman · 2024-08-02T00:53:40Z

packages/zql/src/iterable-explore.test.ts

+type Event = Add | Remove | NoOp;
+const ADD = 1;
+const REMOVE = -1;
+const NO_OP = 0;


Total nit but I'd call this NOP out of tradition and ease of typing.

aboodman · 2024-08-02T01:14:56Z

packages/zql/src/iterable-explore.test.ts

+}[];
+
+function* filter(
+  path: (string | typeof node)[],


It's only valid for node to be the last entry in the path right? Little awkward since the path represented can be invalid.

yeah, that's true. This should work type path = [...string[], symbol];

jeez, does that really work? u crazy typescript.

alternately we can just make the path be the prefix and not include / assume the last element.

right. The last element is always the symbol now. When I first started this (a couple commits back) it could be a string and have any name.

aboodman · 2024-08-02T01:20:39Z

packages/zql/src/iterable-explore.test.ts

+  cb: (v: unknown) => boolean,
+): IterableIterator<Entry> {
+  const [head, ...tail] = path;
+  for (const row of iterable) {


This bit is really nice.

aboodman · 2024-08-02T01:21:33Z

packages/zql/src/iterable-explore.test.ts

+  }
+}
+
+function* loopJoin(


I don't understand the significance of the word loop here. Explain?

just that it is a dumb nestedLoopJoin / an n^2 join.

aboodman · 2024-08-02T01:29:12Z

packages/zql/src/iterable-explore.test.ts

+function* loopJoin(
+  left: Iterable<Entry>,
+  right: Iterable<Entry>,
+  leftItemPath: (string | typeof node)[],


Do we actually need the ability to join at a level other than the top on the left side? It's not exercised below in your tests, and i can't think of a case from our featureset.

The right side is always the top. The left side could be an arbitrary depth.

issue -> comments -> revisions -> author

the revision would be added inside the comments iterable of the issue.

It's not exercised below in your tests

test('loop join with a loop join') exercises it.

You're right, I was thinking of it backward. Thanks.

aboodman · 2024-08-02T01:37:32Z

packages/zql/src/iterable-explore.test.ts

+  expect(numVisits).toEqual(4 * issueSource.length);
+});
+
+class RestartableIterableIterator<T> {


arv

Thanks for this.

Now that I've seen this and reread the notion doc things are slowly falling into place.

This will need a lot of comments and more exhaustive tests but I understand where this is taking us.

arv · 2024-08-02T07:53:38Z

packages/zql/src/iterable-explore.test.ts

@@ -0,0 +1,614 @@
+import {expect, test} from 'vitest';
+import type {JSONObject} from '../../shared/src/json.js';


Suggested change

import type {JSONObject} from '../../shared/src/json.js';

import type {JSONObject} from 'shared/src/json.js';

Is there any way to get vscode to default to the correct path? I find myself always manually fixing this.

I think so... Try this pref:

javascript.preferences.importModuleSpecifier

arv · 2024-08-02T08:02:26Z

packages/zql/src/iterable-explore.test.ts

+  id: number;
+  title: string;
+};
+const event = Symbol();


I wonder if we should include a description in these?

In the past I haven't because symbols are generally not part of a public API so no one needs to know what they are but for debugging things it is nice.

adding descriptions in the "production version" of this.

arv · 2024-08-02T08:03:46Z

packages/zql/src/iterable-explore.test.ts

+const event = Symbol();
+const node = Symbol();
+type Event = Add | Remove | NoOp;
+const ADD = 1;


One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

arv · 2024-08-02T08:07:33Z

packages/zql/src/iterable-explore.test.ts

+  iterable: Iterable<Entry>,
+  cb: (v: unknown) => boolean,
+): IterableIterator<Entry> {
+  const [head, ...tail] = path;


This can be a bit more efficient using:

const head = path[0];

and tail can be a skip iterator that skips 1

arv · 2024-08-02T08:07:50Z

packages/zql/src/iterable-explore.test.ts

+function* filter(
+  path: (string | typeof node)[],
+  iterable: Iterable<Entry>,
+  cb: (v: unknown) => boolean,


these are commonly called p as in predicate

also, why unknown?

Yeah, I think it should be filter<T>( ..., p: (v: T) => boolean)

Only trying to get the general idea across to help illustrate and refine the design. I'll end up closing this PR without merging it and maybe referencing it from the design doc.

arv · 2024-08-02T08:16:33Z

packages/zql/src/iterable-explore.test.ts

+    for (let i = 0; i < k; i++) {
+      yield sorted[i];
+    }


or yield* sorted.slice(0, k)

arv · 2024-08-02T08:20:42Z

packages/zql/src/iterable-explore.test.ts

+  }
+
+  for (const iter of iters) {
+    iter.return!();


iter.return?.()

arv · 2024-08-02T08:29:37Z

packages/zql/src/iterable-explore.test.ts

+    return new RestartableIterableIterator(this.#func, true);
+  }
+  next() {
+    return this.#iter!.next();


#iter is only set if invoke is true in the constructor. Is this correc?

I assume this is only used by tests but maybe:

restartable(...): Iterable

or split into two classes since the new RestartableIterableIterator(..., false) only works as an Iterable but new RestartableIterableIterator(..., true) works as an IterableIterator.

I wish these interfaces were less confusing.

I find this whole restartable thing confusing myself.

Like you could technically invoke it in an invalid state (calling next before the generator was ever called) which strikes me as really strange.

Do you have any thoughts on how to fix this?

tantaman added 3 commits July 31, 2024 18:07

sketch out nested iterable operator implementations

f4c3769

experiment with restartable IterableIterators

689ebd3

join exploration

afc835b

vercel bot deployed to Preview – zeppliear August 1, 2024 02:22 View deployment

tantaman added 2 commits August 1, 2024 11:24

topk exploration and another join test

548f1b4

fork and merge exploration -- building up to no more tx-distinct

0c78c82

and maybe this obviates `genCached` too?

vercel bot deployed to Preview – replicache-docs August 1, 2024 15:57 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 16:00 View deployment

remove broken restartable implementation

0e379a9

tantaman force-pushed the mlaw/iterable-explore branch from 8177b75 to 0e379a9 Compare August 1, 2024 16:04

vercel bot deployed to Preview – replicache-docs August 1, 2024 16:07 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 16:07 View deployment

stable name for the parent row so we can extract the id for `mergeDis…

b7f3d73

…tinct`

vercel bot deployed to Preview – replicache-docs August 1, 2024 17:03 View deployment

vercel bot deployed to Preview – zeppliear August 1, 2024 17:04 View deployment

distinct merge to replcae tx-distinct for or paths

ad56a6c

tantaman force-pushed the mlaw/iterable-explore branch from f2c2781 to ad56a6c Compare August 1, 2024 17:12

tantaman changed the title ~~explore implementations for operators against NestedIterator~~ design exploration: implementations for operators against NestedIterator Aug 1, 2024

vercel bot deployed to Preview – replicache-docs August 1, 2024 17:15 View deployment

tantaman changed the title ~~design exploration: implementations for operators against NestedIterator~~ [rfc] design exploration: implementations for operators against NestedIterator Aug 1, 2024

vercel bot deployed to Preview – zeppliear August 1, 2024 17:16 View deployment

tantaman marked this pull request as ready for review August 1, 2024 17:19

tantaman requested review from aboodman and arv August 1, 2024 17:24

aboodman reviewed Aug 2, 2024

View reviewed changes

arv reviewed Aug 2, 2024

View reviewed changes

tantaman closed this Aug 7, 2024

tantaman deleted the mlaw/iterable-explore branch March 24, 2025 15:13

		@@ -0,0 +1,614 @@
		import {expect, test} from 'vitest';
		import type {JSONObject} from '../../shared/src/json.js';

	import type {JSONObject} from '../../shared/src/json.js';
	import type {JSONObject} from 'shared/src/json.js';

[rfc] design exploration: implementations for operators against NestedIterator #2103

[rfc] design exploration: implementations for operators against NestedIterator #2103

Uh oh!

Conversation

tantaman commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tantaman commented Aug 1, 2024 •

edited

Loading

vercel bot commented Aug 1, 2024 •

edited

Loading