Skip to content

Conversation

tantaman
Copy link
Contributor

@tantaman tantaman commented Aug 1, 2024

This should be the final form (in terms of types) of NestedIterable. The operators (join/map/filter) are sketches to confirm the shape of NestedIterable is what we need. NestedIterable is an Iterable of Entry.

const event = Symbol();
const node = Symbol();
type Add = typeof ADD;
type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Event = Add | Remove | NoOp;
type Entry<Type = JSONObject> = {
  [node]: Type;
  [event]: Event;
  [children: string]: Iterable<Entry>;
};
type NestedIterable = Iterable<Entry>;

I've explored:

  • join
  • topk
  • filter
  • map

As well as:

  • forking (for or)
  • merging
  • merge-distinct (for or)

The latter set uses restartable so many iterators can be gathered from the base Iterable, allowing all forks to get the same data.

Related document: https://www.notion.so/replicache/NestedIterable-5123f11b877e41b7bc9f00486d491d8b#adb39380a6f74402ace02cb85fe0c405

There are some possible future explorations:

  • join when dealing with deltas and many:1 & many:many relationships
  • Passing sort information to topk along with the iterable

But it seems like diminishing returns at this point.

Copy link

vercel bot commented Aug 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
replicache-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 1, 2024 5:16pm
zeppliear ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 1, 2024 5:16pm

@tantaman tantaman force-pushed the mlaw/iterable-explore branch from f2c2781 to ad56a6c Compare August 1, 2024 17:12
@tantaman tantaman changed the title explore implementations for operators against NestedIterator design exploration: implementations for operators against NestedIterator Aug 1, 2024
@tantaman tantaman changed the title design exploration: implementations for operators against NestedIterator [rfc] design exploration: implementations for operators against NestedIterator Aug 1, 2024
@tantaman tantaman marked this pull request as ready for review August 1, 2024 17:19
@tantaman tantaman requested review from aboodman and arv August 1, 2024 17:24
const event = Symbol();
const node = Symbol();
type Event = Add | Remove | NoOp;
const ADD = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just give them string values so we can catch/find if there is any reason to do arithmetic. I don't think there is, but then we'll know for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

import type {JSONObject} from '../../shared/src/json.js';

// eslint-disable-next-line @typescript-eslint/no-explicit-any
type TODO = any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a silly way to put cast things as any without needing an eslint comment each time 😅

type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Entry<Type = JSONObject> = {
[node]: Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the parent row get referred to by this symbol, but child iterators get actual names based on their table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeDistinct needed a way to find and id for a row. Giving the parent row a stable name made that easier to do. Otherwise we'd need to iterate all the keys in the row and find the thing that isn't iterable.

type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Entry<Type = JSONObject> = {
[node]: Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use a symbol here let's try to pick a unique name. How about Entity that is the name we used before. Or Row?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is it enforced that every value flowing through the pipeline has a unique ID as it does today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every row will have a primary key and we can't mix rows from different tables in the same iterable level so I think this is true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I pressed publish on this review too quickly and my comments were kind of cryptic. Here's what I should have said:

  1. It's better if we pick names for things in the system that are unique and not already used by other concepts. The name "node" is already often used to refer to pipeline nodes. So what about using a different name like "row" or "entity"? The reason this is important is so that when talking about the system we can just use a short name for a concept rather than having to qualify it. Like we can say "entity" rather than "pipeline entry node".
  2. In the existing ivm, we require all things flowing through the pipeline to have a unique ID. Are we going to in the new system too? If we are then how come there's no id field here? is it just because this is a sketch and the id wasn't required, or is its absence important somehow? If there's going to be an ID for each of the thingies flowing through the pipeline, then the name "entity" or "PipelineEntity" makes even more sense.

Sorry for obsessing so much about names, but I think a big part of system design is just choosing good names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed things in the PR that is meant to be merged into main:

I've also gone back to requiring all Entity types to have an id field. We'll need to revisit this when we add compound primary keys which we've discussed before.

type Event = Add | Remove | NoOp;
const ADD = 1;
const REMOVE = -1;
const NO_OP = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total nit but I'd call this NOP out of tradition and ease of typing.

}[];

function* filter(
path: (string | typeof node)[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only valid for node to be the last entry in the path right? Little awkward since the path represented can be invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's true. This should work type path = [...string[], symbol];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jeez, does that really work? u crazy typescript.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternately we can just make the path be the prefix and not include / assume the last element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. The last element is always the symbol now. When I first started this (a couple commits back) it could be a string and have any name.

cb: (v: unknown) => boolean,
): IterableIterator<Entry> {
const [head, ...tail] = path;
for (const row of iterable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is really nice.

}
}

function* loopJoin(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the significance of the word loop here. Explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just that it is a dumb nestedLoopJoin / an n^2 join.

function* loopJoin(
left: Iterable<Entry>,
right: Iterable<Entry>,
leftItemPath: (string | typeof node)[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need the ability to join at a level other than the top on the left side? It's not exercised below in your tests, and i can't think of a case from our featureset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right side is always the top. The left side could be an arbitrary depth.

issue -> comments -> revisions -> author

the revision would be added inside the comments iterable of the issue.

It's not exercised below in your tests

test('loop join with a loop join') exercises it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was thinking of it backward. Thanks.

expect(numVisits).toEqual(4 * issueSource.length);
});

class RestartableIterableIterator<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cute

Copy link
Contributor

@arv arv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

Now that I've seen this and reread the notion doc things are slowly falling into place.

This will need a lot of comments and more exhaustive tests but I understand where this is taking us.

@@ -0,0 +1,614 @@
import {expect, test} from 'vitest';
import type {JSONObject} from '../../shared/src/json.js';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import type {JSONObject} from '../../shared/src/json.js';
import type {JSONObject} from 'shared/src/json.js';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to get vscode to default to the correct path? I find myself always manually fixing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so... Try this pref:

javascript.preferences.importModuleSpecifier

id: number;
title: string;
};
const event = Symbol();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should include a description in these?

In the past I haven't because symbols are generally not part of a public API so no one needs to know what they are but for debugging things it is nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding descriptions in the "production version" of this.

const event = Symbol();
const node = Symbol();
type Event = Add | Remove | NoOp;
const ADD = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

iterable: Iterable<Entry>,
cb: (v: unknown) => boolean,
): IterableIterator<Entry> {
const [head, ...tail] = path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a bit more efficient using:

const head = path[0];

and tail can be a skip iterator that skips 1

function* filter(
path: (string | typeof node)[],
iterable: Iterable<Entry>,
cb: (v: unknown) => boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are commonly called p as in predicate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it should be filter<T>( ..., p: (v: T) => boolean)

Only trying to get the general idea across to help illustrate and refine the design. I'll end up closing this PR without merging it and maybe referencing it from the design doc.

Comment on lines +193 to +195
for (let i = 0; i < k; i++) {
yield sorted[i];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or yield* sorted.slice(0, k)

}

for (const iter of iters) {
iter.return!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iter.return?.()

return new RestartableIterableIterator(this.#func, true);
}
next() {
return this.#iter!.next();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#iter is only set if invoke is true in the constructor. Is this correc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is only used by tests but maybe:

restartable(...): Iterable

or split into two classes since the new RestartableIterableIterator(..., false) only works as an Iterable but new RestartableIterableIterator(..., true) works as an IterableIterator.

I wish these interfaces were less confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this whole restartable thing confusing myself.

Like you could technically invoke it in an invalid state (calling next before the generator was ever called) which strikes me as really strange.

Do you have any thoughts on how to fix this?

@tantaman tantaman closed this Aug 7, 2024
@tantaman tantaman deleted the mlaw/iterable-explore branch March 24, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants