ivm2 skeleton: the return. #2115

aboodman · 2024-08-05T11:40:51Z

OK so I spent the weekend playing with this and trying different things on for size. As I suspected, this ended up pretty close to #2109, just with some simplifications and renaming:

I spent a lot of time thinking through the representation of data (see data.ts). I think this sets us up nicely for multi-column IDs from the beginning.
I decided to rename TreeIterator to TreeDiff after trying on a lot of names. I really struggled with keeping what was going on in my head conceptually and naming this "diff" helped me to keep it straight.
I thought a lot about restartable streams, as they would be useful. Unfortunately they are in conflict with the idea that we update SQLite in lockstep with each change. Once we've iterated the stream once, even if we restart it, SQLite will be in the wrong state. Le. Sigh. So I implemented ChangeStream helper class that enforces the invariant that an iterator/stream can only be consumed once. It also provides a sanity check in case a push diff is not completely consumed.
- Note: nothing's impossible. We could bring even more shenanigans to bear, like reverting the changes we made to SQLite when restarting the stream. But I decided it wasn't worth it for now, and I'm still hoping we'll have some insight on how to better connect SQLite to the IVM.
I removed DifferenceStream. I don't understand how the forking that it did was possible to implement correctly with pull() and only-once iteration. Also it just wasn't doing much and I was on a quest to remove everything until needed. Hopefully it's not proven needed tomorrow. (In seriousness, I think we'll have to have a first-class concept of Fork that is more principled).
I removed the concept of transactions for now. I understand now why we'll need some notion of cross-source atomic changes, but I'm not yet convinced that has to be a concept that goes beyond the views.
I removed the concept of versions. I believe that what it was doing can be done other ways. Anyway we can bring it back if/when needed.
I had the exact same idea as you about having Operator just be Input and Output. So cute.
I do not yet see the need for separate interfaces for Source and View. They look like just special inputs and outputs to me.
I broke all my own rules about choosing names that aren't used elsewhere and abbreviations. When I was in there doing it myself I see why you did some of those things. This is a pretty dense universe all its own, and the heart of Zero. It should not have to dance to everyone else's drum.

I also spent a great deal of time thinking about how push and pull flow up and down the pipeline and I feel pretty good about it now. I have a fairly complete sketch on paper. It's hard, but it does seem like it will work and I can't think of any way it could be better.

I had a lot of fun working this all through. Thanks for letting me take a spin at it and put it "in my own words" so to speak. I think this has made this whole thing a lot more concrete in my head.

vercel · 2024-08-05T11:40:55Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
replicache-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 5, 2024 11:40pm
zeppliear	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 5, 2024 11:40pm

aboodman · 2024-08-05T11:45:58Z

I'm going to keep cleaning this up a bit more but wanted to share now.

tantaman · 2024-08-05T12:51:52Z

packages/zql/src/zql/ast2/ast.ts

@@ -0,0 +1,8 @@
+export type OrderPart = [field: string, direction: 'asc' | 'desc'];


It'd be worth some commentary on why OrderPart is not a path.

Right now we only support ordering by root level rows. Is that what you had in mind or something else?

tantaman · 2024-08-05T12:54:38Z

packages/zql/src/zql/ivm2/capture-output.ts

+
+/**
+ * A simple output that consumes and stores all pushed changes.
+ * TODO(aa): Extend to support storing subdiffs too.


subdiff code you could start from: #2113

I removed the TODO. I'm going to try something – I'm going to use TODO to mean stuff that we actually should do soon, not stuff that we just haven't needed yet. But yes, thanks for the start there.

tantaman · 2024-08-05T13:00:18Z

packages/zql/src/zql/ivm2/operator.ts

+ * implement them client-side where we lack a real database 😢.
+ */
+export type Constraint = {
+  field: string;


this should be a path, no?

Why don't constraints have operators? Constraints are only ever equality?

It will probably need to be a path yes. I wanted to leave that to a future change that works through pull as a focused thing.

tantaman · 2024-08-05T13:01:21Z

packages/zql/src/zql/ivm2/operator.ts

+ */
+export type Response = {
+  diff: TreeDiff;
+  appliedFilters: Filter[];


why is responding with appliedFilters important?

Adding:

// The filters that the source applied to the response. This allows the // Filter operator to know if it needs to reapply the filter.

tantaman · 2024-08-05T13:06:10Z

packages/zql/src/zql/ivm2/operator.ts

+  // If null, it means include all subdiffs.
+  restrictToSubdiffs: string[] | null;
+
+  // TODO: startAt, direction for limit?


limit should almost never be in a request. The upstream can't know how many ways the data may get filtered before reaching its destination. The only way limit would make sense is if it is possible to collect every single constraint on the path from destination to source. If the source can honor all of those constraints then it can honor the limit.

startAt is encapsulated by constraint

direction should be here via ordering although we have divergent opinions on this.

limit should almost never be in a request.

Agreed. I meant that we will need startAt and direction in order to implement limit (ie topk).

startAt is encapsulated by constraint

I go back and forth about this. Maybe you're correct.

direction should be here via ordering although we have divergent opinions on this.

I don't understand this comment. I think we'll figure this out together as we get into pull.

tantaman · 2024-08-05T13:10:46Z

packages/zql/src/zql/ivm2/operator.ts

+  optionalFilters: Filter[];
+
+  // If null, it means include all subdiffs.
+  restrictToSubdiffs: string[] | null;


can you explain this a bit more? How does a requestor know if subdiffs should be included?

If this is related to sibling subqueries in select positions, I think a custom join is a better route to doing this.

function joinManySiblings(parent: ChangeStream, siblings: ChangeStream[]): ChangeStream { // when any one sibling changes, do not join any other siblings }

That structure, while a new concept, is much less prone to bugs as it's responsibility is explicit rather than overloading an existing operator that needs to switch on restrictToSubdiffs.

Consider the case of topk(comments, 100). When a comment for issue 42 is deleted, we must refill the window. The data we are looking for is issue 42's comments. We do not want all the rest of the goop associated with issue42.

I think it is elegant way to do this is to have topk pull on issue42, but with information in the pull that restricts what to return.

tantaman · 2024-08-05T13:15:02Z

packages/zql/src/zql/ivm2/change-stream.ts

+ *   once because they are coupled to the state of the datastore, which would
+ *   need to be reset if we wanted to iterate again. We want to enforce that
+ *   they can only be iterated once.
+ * - If the iterator is representing some data that was pushed, it must be


What is your thinking on the "mixed" case? Data is pushed but then we hit a sub-query which represents pulled data.

Will a TreeDiff have iterators with both of these modes?

Yeah they will often be mixed - that's why this is a feature of the iterator, not the entire tree.

issue -> comment -> revision

When a comment is pushed, we will pull on issue with a constraint. Then the joined result flows down to the next join and we get the revisions for the new comment.

In this example only the iterator of comment changes is needy. I think the general rule is that the iterator that wraps data that was pushed is needy and all other iterators are normal.

tantaman · 2024-08-05T13:15:45Z

packages/zql/src/zql/ivm2/change-stream.ts

+ *   need to be reset if we wanted to iterate again. We want to enforce that
+ *   they can only be iterated once.
+ * - If the iterator is representing some data that was pushed, it must be
+ *   fully consumed. Otherwise, the resulting state of the query will be


Referencing topk as an example as to why a query would be incomplete would be useful here.

Added this:

* * For an example of (2), consider a query like: * * z.issue.select().orderBy('id').limit(10); * * On the first pull, the ChangeStream we receive will be `normal`. We can stop * consuming it at any point because it is sorted. When we've received 10 rows, * we can stop consuming it and the query will be complete. * * Now consider that someone pushes two changes into the pipeline. These changes * will not be sorted and so we must consume them all to ensure that the query * results are correct.

... but maybe it would be better to call the type sorted or unsorted. I will try that.

I added a sorted property to TreeDiff. I realized that topk will need to check whether output is sorted or not, it can't just rely on whether change came from push/pull.

tantaman · 2024-08-05T13:16:21Z

packages/zql/src/zql/ivm2/change-stream.ts

+    return result;
+  }
+
+  return() {


please implement throw as well so the base iterator can be cleaned up on exceptions.

you'll need to call this.#iterator.return?.() as well to ensure any resources (like the SQLite statement) are freed / reset.

ah, good catch. thank you!

tantaman · 2024-08-05T13:20:06Z

packages/zql/src/zql/ivm2/filter-operator.ts

+ * joins, which is more efficient since it reduces the number of rows that must
+ * be joined.
+ *
+ * This may not be sufficient in the future, when we have subqueries in the


subqueries in the where position as well as where clauses that reference the parent. I think the latter is likely a quick follow up.

z.issue.select( q => q.include('parent').where($parent.modified, '>', 'modified') )

Currently I am afraid about ballooning complexity of ivm. Maybe I will become as confident as you. Right now I want to be cautious about feature inclusion. Let's get subqueries and limit working well end to end and revisit.

tantaman · 2024-08-05T13:23:27Z

packages/zql/src/zql/ivm2/filter-operator.ts

+  }
+}
+
+function matchesPredicate(lhs: Value, op: SimpleOperator, rhs: Value): boolean {


fwiw, I think it'd make sense to return a lambda here that is the predicate so the switch is only ever run once at pipeline construction.

Heh, I guess it depends on the diff between closure/functioncall overhead and switch overhead on the string. I assumed the strings would be interned, but who knows. I guess we'll see it in a profile if it matters.

Or maybe you can do one of your microbenchmarks to answer it concretely.

tantaman · 2024-08-05T13:26:52Z

Note: nothing's impossible. We could bring even more shenanigans to bear, like reverting the changes we made to SQLite when restarting the stream.

Or even a connection and begin concurrent per pipeline 😅. Or a savepoint inside the begin concurrent that gets started and popped before and after each pipeline.

I do not yet see the need for separate interfaces for Source and View.

Yeah, they're basically identical and I think we'll want to be able to use views as sources one day.

tantaman · 2024-08-05T14:22:18Z

packages/zql/src/zql/ivm2/memory-input.ts

+ * This data is kept in sorted order as downstream pipelines will
+ * always expect the data they receive from `pull` to be in sorted order.
+ */
+export class MemoryInput implements Input {


Input is a weird name to me given the input retains everything that has be input.

Yea it might be overly clever. I kinda like it. Let's sit with it for a bit.

tantaman · 2024-08-06T14:22:26Z

packages/zql/src/zql/ivm2/filter-operator.ts

+  readonly #input: Input;
+  readonly #predicate: Filter;
+
+  #output: Output | null = null;


@arv is always telling me to use undefined.

#output?: Output | undefined;

Not sure if @arv would still feel this way here. I think he's talking about cases where the fields are fields of objects that people are construction and passing around as literals. In those cases it is convenient to type the field as {foo?: Foo | undefined} because caller can just say {}.

The question of whether to use (missing field), undefined, or null in JS: a question as old as time. I feel like here it makes sense to have the field always present (no ?) since it is behaving like a class private field, not a member of an option bag. And since it's always present I don't like to use undefined – I prefer to treat the case of missing field and present field with undefined value as the same thing).

I'm fine with null her but when/if this gets used as a param to something else it will most likely make more sense as undefined.

There are no hard rules (except no megamorphic classes!)

There are no hard rules

twitch twitch

tantaman · 2024-08-06T14:27:21Z

packages/zql/src/zql/ivm2/memory-input.ts

+  *#applyChanges(changes: Iterable<Change>) {
+    for (const change of changes) {
+      if (change.type === 'add') {
+        this.#tree = this.#tree.with(change.row, undefined);
+        yield change;
+      } else {
+        yield change;
+        this.#tree = this.#tree.without(change.row);
+      }
+    }
+  }


technically we can no longer do this as soon as a source has more than one pipeline attached to it. The reason being that the tree will no longer be in the correct state when the second pipeline starts.

Relevant for @grgbkr & https://www.notion.so/replicache/IVM-Nextsteps-a09fe0c5fc6c4beea117d7e6908790b2?p=5fee4873f8664f8189b1bf8954fc604e&pm=s

The tree is immutable so we can have every historical version of it... Maybe there's something there to be leveraged. ~~Another idea is to run each pipeline one step at a time in turn.~~

for (const pipe of pipelines) { pipe.step(); }

~~and have a fork operator directly attached to a source that can send the same value down all paths.~~

Ignore that second idea. Filters break it immediately.

Yup, aware of these. @grgbkr when you get to this let me know and we can talk about it.

ivm2 skeleton: the return.

fc33988

aboodman force-pushed the aa/ivm2 branch from 5fe7317 to fc33988 Compare August 5, 2024 11:43

vercel bot deployed to Preview – replicache-docs August 5, 2024 11:45 View deployment

aboodman requested a review from tantaman August 5, 2024 11:45

vercel bot deployed to Preview – zeppliear August 5, 2024 11:46 View deployment

tantaman reviewed Aug 5, 2024

View reviewed changes

Review comments

e6b90fa

vercel bot deployed to Preview – replicache-docs August 5, 2024 23:32 View deployment

Merge branch 'main' into aa/ivm2

6336c00

aboodman enabled auto-merge (rebase) August 5, 2024 23:33

vercel bot deployed to Preview – replicache-docs August 5, 2024 23:35 View deployment

vercel bot deployed to Preview – zeppliear August 5, 2024 23:40 View deployment

aboodman merged commit 54ad1d8 into main Aug 5, 2024

aboodman deleted the aa/ivm2 branch August 5, 2024 23:44

tantaman reviewed Aug 6, 2024

View reviewed changes

		@@ -0,0 +1,8 @@
		export type OrderPart = [field: string, direction: 'asc' \| 'desc'];

ivm2 skeleton: the return. #2115

ivm2 skeleton: the return. #2115

Uh oh!

Conversation

aboodman commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aboodman commented Aug 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aboodman Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tantaman commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aboodman commented Aug 5, 2024 •

edited

Loading

vercel bot commented Aug 5, 2024 •

edited

Loading

aboodman Aug 5, 2024 •

edited

Loading

tantaman commented Aug 5, 2024 •

edited

Loading

tantaman Aug 6, 2024 •

edited

Loading