-
Notifications
You must be signed in to change notification settings - Fork 96
[rfc] design exploration: implementations for operators against NestedIterator #2103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
and maybe this obviates `genCached` too?
8177b75
to
0e379a9
Compare
f2c2781
to
ad56a6c
Compare
const event = Symbol(); | ||
const node = Symbol(); | ||
type Event = Add | Remove | NoOp; | ||
const ADD = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we just give them string values so we can catch/find if there is any reason to do arithmetic. I don't think there is, but then we'll know for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.
it all depends on how hot things are and with global warming it is going to get pretty hot
import type {JSONObject} from '../../shared/src/json.js'; | ||
|
||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
type TODO = any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a silly way to put cast things as any
without needing an eslint
comment each time 😅
type Remove = typeof REMOVE; | ||
type NoOp = typeof NO_OP; | ||
type Entry<Type = JSONObject> = { | ||
[node]: Type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the parent row get referred to by this symbol, but child iterators get actual names based on their table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mergeDistinct
needed a way to find and id for a row. Giving the parent row a stable name made that easier to do. Otherwise we'd need to iterate all the keys in the row and find the thing that isn't iterable.
type Remove = typeof REMOVE; | ||
type NoOp = typeof NO_OP; | ||
type Entry<Type = JSONObject> = { | ||
[node]: Type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use a symbol here let's try to pick a unique name. How about Entity
that is the name we used before. Or Row
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is it enforced that every value flowing through the pipeline has a unique ID as it does today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every row will have a primary key and we can't mix rows from different tables in the same iterable level so I think this is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I pressed publish on this review too quickly and my comments were kind of cryptic. Here's what I should have said:
- It's better if we pick names for things in the system that are unique and not already used by other concepts. The name "node" is already often used to refer to pipeline nodes. So what about using a different name like "row" or "entity"? The reason this is important is so that when talking about the system we can just use a short name for a concept rather than having to qualify it. Like we can say "entity" rather than "pipeline entry node".
- In the existing ivm, we require all things flowing through the pipeline to have a unique ID. Are we going to in the new system too? If we are then how come there's no id field here? is it just because this is a sketch and the id wasn't required, or is its absence important somehow? If there's going to be an ID for each of the thingies flowing through the pipeline, then the name "entity" or "PipelineEntity" makes even more sense.
Sorry for obsessing so much about names, but I think a big part of system design is just choosing good names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed things in the PR that is meant to be merged into main:
- https://github.com/rocicorp/mono/pull/2109/files : packages/zql/src/zql/ivm-2/iterable-tree.ts
I've also gone back to requiring all Entity
types to have an id
field. We'll need to revisit this when we add compound primary keys which we've discussed before.
type Event = Add | Remove | NoOp; | ||
const ADD = 1; | ||
const REMOVE = -1; | ||
const NO_OP = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nit but I'd call this NOP
out of tradition and ease of typing.
}[]; | ||
|
||
function* filter( | ||
path: (string | typeof node)[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only valid for node
to be the last entry in the path right? Little awkward since the path represented can be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's true. This should work type path = [...string[], symbol];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jeez, does that really work? u crazy typescript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternately we can just make the path be the prefix and not include / assume the last element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. The last element is always the symbol now. When I first started this (a couple commits back) it could be a string and have any name.
cb: (v: unknown) => boolean, | ||
): IterableIterator<Entry> { | ||
const [head, ...tail] = path; | ||
for (const row of iterable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is really nice.
} | ||
} | ||
|
||
function* loopJoin( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the significance of the word loop here. Explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just that it is a dumb nestedLoopJoin
/ an n^2 join.
function* loopJoin( | ||
left: Iterable<Entry>, | ||
right: Iterable<Entry>, | ||
leftItemPath: (string | typeof node)[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need the ability to join at a level other than the top on the left side? It's not exercised below in your tests, and i can't think of a case from our featureset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right side is always the top. The left side could be an arbitrary depth.
issue -> comments -> revisions -> author
the revision would be added inside the comments
iterable of the issue.
It's not exercised below in your tests
test('loop join with a loop join')
exercises it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I was thinking of it backward. Thanks.
expect(numVisits).toEqual(4 * issueSource.length); | ||
}); | ||
|
||
class RestartableIterableIterator<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
Now that I've seen this and reread the notion doc things are slowly falling into place.
This will need a lot of comments and more exhaustive tests but I understand where this is taking us.
@@ -0,0 +1,614 @@ | |||
import {expect, test} from 'vitest'; | |||
import type {JSONObject} from '../../shared/src/json.js'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import type {JSONObject} from '../../shared/src/json.js'; | |
import type {JSONObject} from 'shared/src/json.js'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to get vscode to default to the correct path? I find myself always manually fixing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so... Try this pref:
javascript.preferences.importModuleSpecifier
id: number; | ||
title: string; | ||
}; | ||
const event = Symbol(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should include a description in these?
In the past I haven't because symbols are generally not part of a public API so no one needs to know what they are but for debugging things it is nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding descriptions in the "production version" of this.
const event = Symbol(); | ||
const node = Symbol(); | ||
type Event = Add | Remove | NoOp; | ||
const ADD = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason to use SMIs over string is efficiency. SMIs are stack allocated and strings heap allocated.
it all depends on how hot things are and with global warming it is going to get pretty hot
iterable: Iterable<Entry>, | ||
cb: (v: unknown) => boolean, | ||
): IterableIterator<Entry> { | ||
const [head, ...tail] = path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a bit more efficient using:
const head = path[0];
and tail can be a skip iterator that skips 1
function* filter( | ||
path: (string | typeof node)[], | ||
iterable: Iterable<Entry>, | ||
cb: (v: unknown) => boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are commonly called p
as in predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why unknown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it should be filter<T>( ..., p: (v: T) => boolean)
Only trying to get the general idea across to help illustrate and refine the design. I'll end up closing this PR without merging it and maybe referencing it from the design doc.
for (let i = 0; i < k; i++) { | ||
yield sorted[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or yield* sorted.slice(0, k)
} | ||
|
||
for (const iter of iters) { | ||
iter.return!(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iter.return?.()
return new RestartableIterableIterator(this.#func, true); | ||
} | ||
next() { | ||
return this.#iter!.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#iter
is only set if invoke is true in the constructor. Is this correc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is only used by tests but maybe:
restartable(...): Iterable
or split into two classes since the new RestartableIterableIterator(..., false)
only works as an Iterable
but new RestartableIterableIterator(..., true)
works as an IterableIterator
.
I wish these interfaces were less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this whole restartable thing confusing myself.
Like you could technically invoke it in an invalid state (calling next
before the generator was ever called) which strikes me as really strange.
Do you have any thoughts on how to fix this?
This should be the final form (in terms of types) of
NestedIterable
. The operators (join/map/filter) are sketches to confirm the shape ofNestedIterable
is what we need.NestedIterable
is anIterable
ofEntry
.I've explored:
As well as:
The latter set uses
restartable
so many iterators can be gathered from the baseIterable
, allowing all forks to get the same data.Related document: https://www.notion.so/replicache/NestedIterable-5123f11b877e41b7bc9f00486d491d8b#adb39380a6f74402ace02cb85fe0c405
There are some possible future explorations:
join
when dealing with deltas and many:1 & many:many relationshipssort
information totopk
along with theiterable
But it seems like diminishing returns at this point.