Skip to content

Conversation

tantaman
Copy link
Contributor

@tantaman tantaman commented Aug 1, 2024

This should be the final form (in terms of types) of NestedIterable. The operators (join/map/filter) are sketches to confirm the shape of NestedIterable is what we need. NestedIterable is an Iterable of Entry.

const event = Symbol();
const node = Symbol();
type Add = typeof ADD;
type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Event = Add | Remove | NoOp;
type Entry<Type = JSONObject> = {
  [node]: Type;
  [event]: Event;
  [children: string]: Iterable<Entry>;
};
type NestedIterable = Iterable<Entry>;

I've explored:

  • join
  • topk
  • filter
  • map

As well as:

  • forking (for or)
  • merging
  • merge-distinct (for or)

The latter set uses restartable so many iterators can be gathered from the base Iterable, allowing all forks to get the same data.

Related document: https://www.notion.so/replicache/NestedIterable-5123f11b877e41b7bc9f00486d491d8b#adb39380a6f74402ace02cb85fe0c405

There are some possible future explorations:

  • join when dealing with deltas and many:1 & many:many relationships
  • Passing sort information to topk along with the iterable

But it seems like diminishing returns at this point.

Copy link

vercel bot commented Aug 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
replicache-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 1, 2024 5:16pm
zeppliear ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 1, 2024 5:16pm

@tantaman tantaman force-pushed the mlaw/iterable-explore branch from f2c2781 to ad56a6c Compare August 1, 2024 17:12
@tantaman tantaman changed the title explore implementations for operators against NestedIterator design exploration: implementations for operators against NestedIterator Aug 1, 2024
@tantaman tantaman changed the title design exploration: implementations for operators against NestedIterator [rfc] design exploration: implementations for operators against NestedIterator Aug 1, 2024
@tantaman tantaman marked this pull request as ready for review August 1, 2024 17:19
@tantaman tantaman requested review from aboodman and arv August 1, 2024 17:24
const event = Symbol();
const node = Symbol();
type Event = Add | Remove | NoOp;
const ADD = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just give them string values so we can catch/find if there is any reason to do arithmetic. I don't think there is, but then we'll know for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.

it all depends on how hot things are and with global warming it is going to get pretty hot

import type {JSONObject} from '../../shared/src/json.js';

// eslint-disable-next-line @typescript-eslint/no-explicit-any
type TODO = any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a silly way to put cast things as any without needing an eslint comment each time 😅

type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Entry<Type = JSONObject> = {
[node]: Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the parent row get referred to by this symbol, but child iterators get actual names based on their table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeDistinct needed a way to find and id for a row. Giving the parent row a stable name made that easier to do. Otherwise we'd need to iterate all the keys in the row and find the thing that isn't iterable.

type Remove = typeof REMOVE;
type NoOp = typeof NO_OP;
type Entry<Type = JSONObject> = {
[node]: Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use a symbol here let's try to pick a unique name. How about Entity that is the name we used before. Or Row?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is it enforced that every value flowing through the pipeline has a unique ID as it does today?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every row will have a primary key and we can't mix rows from different tables in the same iterable level so I think this is true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I pressed publish on this review too quickly and my comments were kind of cryptic. Here's what I should have said:

  1. It's better if we pick names for things in the system that are unique and not already used by other concepts. The name "node" is already often used to refer to pipeline nodes. So what about using a different name like "row" or "entity"? The reason this is important is so that when talking about the system we can just use a short name for a concept rather than having to qualify it. Like we can say "entity" rather than "pipeline entry node".
  2. In the existing ivm, we require all things flowing through the pipeline to have a unique ID. Are we going to in the new system too? If we are then how come there's no id field here? is it just because this is a sketch and the id wasn't required, or is its absence important somehow? If there's going to be an ID for each of the thingies flowing through the pipeline, then the name "entity" or "PipelineEntity" makes even more sense.

Sorry for obsessing so much about names, but I think a big part of system design is just choosing good names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed things in the PR that is meant to be merged into main:

I've also gone back to requiring all Entity types to have an id field. We'll need to revisit this when we add compound primary keys which we've discussed before.