-
Notifications
You must be signed in to change notification settings - Fork 96
ivm2 skeleton: the return. #2115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I'm going to keep cleaning this up a bit more but wanted to share now. |
@@ -0,0 +1,8 @@ | |||
export type OrderPart = [field: string, direction: 'asc' | 'desc']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be worth some commentary on why OrderPart
is not a path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we only support ordering by root level rows. Is that what you had in mind or something else?
|
||
/** | ||
* A simple output that consumes and stores all pushed changes. | ||
* TODO(aa): Extend to support storing subdiffs too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subdiff code you could start from: #2113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the TODO. I'm going to try something – I'm going to use TODO
to mean stuff that we actually should do soon, not stuff that we just haven't needed yet. But yes, thanks for the start there.
* implement them client-side where we lack a real database 😢. | ||
*/ | ||
export type Constraint = { | ||
field: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a path, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't constraints have operators? Constraints are only ever equality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will probably need to be a path yes. I wanted to leave that to a future change that works through pull as a focused thing.
*/ | ||
export type Response = { | ||
diff: TreeDiff; | ||
appliedFilters: Filter[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is responding with appliedFilters
important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding:
// The filters that the source applied to the response. This allows the
// Filter operator to know if it needs to reapply the filter.
// If null, it means include all subdiffs. | ||
restrictToSubdiffs: string[] | null; | ||
|
||
// TODO: startAt, direction for limit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit should almost never be in a request. The upstream can't know how many ways the data may get filtered before reaching its destination. The only way limit
would make sense is if it is possible to collect every single constraint on the path from destination to source. If the source can honor all of those constraints then it can honor the limit.
startAt
is encapsulated by constraint
direction
should be here via ordering
although we have divergent opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit should almost never be in a request.
Agreed. I meant that we will need startAt
and direction
in order to implement limit (ie topk).
startAt is encapsulated by constraint
I go back and forth about this. Maybe you're correct.
direction should be here via ordering although we have divergent opinions on this.
I don't understand this comment. I think we'll figure this out together as we get into pull.
optionalFilters: Filter[]; | ||
|
||
// If null, it means include all subdiffs. | ||
restrictToSubdiffs: string[] | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain this a bit more? How does a requestor know if subdiffs should be included?
If this is related to sibling subqueries in select positions, I think a custom join is a better route to doing this.
function joinManySiblings(parent: ChangeStream, siblings: ChangeStream[]): ChangeStream {
// when any one sibling changes, do not join any other siblings
}
That structure, while a new concept, is much less prone to bugs as it's responsibility is explicit rather than overloading an existing operator that needs to switch on restrictToSubdiffs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the case of topk(comments, 100)
. When a comment for issue 42 is deleted, we must refill the window. The data we are looking for is issue 42's comments. We do not want all the rest of the goop associated with issue42.
I think it is elegant way to do this is to have topk pull on issue42, but with information in the pull that restricts what to return.
* once because they are coupled to the state of the datastore, which would | ||
* need to be reset if we wanted to iterate again. We want to enforce that | ||
* they can only be iterated once. | ||
* - If the iterator is representing some data that was pushed, it must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is your thinking on the "mixed" case? Data is pushed but then we hit a sub-query which represents pulled data.
Will a TreeDiff have iterators with both of these modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah they will often be mixed - that's why this is a feature of the iterator, not the entire tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue -> comment -> revision
When a comment is pushed, we will pull on issue with a constraint. Then the joined result flows down to the next join and we get the revisions for the new comment.
In this example only the iterator of comment changes is needy
. I think the general rule is that the iterator that wraps data that was pushed is needy and all other iterators are normal
.
* need to be reset if we wanted to iterate again. We want to enforce that | ||
* they can only be iterated once. | ||
* - If the iterator is representing some data that was pushed, it must be | ||
* fully consumed. Otherwise, the resulting state of the query will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing topk
as an example as to why a query would be incomplete would be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this:
*
* For an example of (2), consider a query like:
*
* z.issue.select().orderBy('id').limit(10);
*
* On the first pull, the ChangeStream we receive will be `normal`. We can stop
* consuming it at any point because it is sorted. When we've received 10 rows,
* we can stop consuming it and the query will be complete.
*
* Now consider that someone pushes two changes into the pipeline. These changes
* will not be sorted and so we must consume them all to ensure that the query
* results are correct.
... but maybe it would be better to call the type sorted
or unsorted
. I will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a sorted
property to TreeDiff. I realized that topk will need to check whether output is sorted or not, it can't just rely on whether change came from push/pull.
return result; | ||
} | ||
|
||
return() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please implement throw
as well so the base iterator can be cleaned up on exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need to call this.#iterator.return?.()
as well to ensure any resources (like the SQLite statement) are freed / reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch. thank you!
* joins, which is more efficient since it reduces the number of rows that must | ||
* be joined. | ||
* | ||
* This may not be sufficient in the future, when we have subqueries in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subqueries in the where position as well as where
clauses that reference the parent. I think the latter is likely a quick follow up.
z.issue.select(
q => q.include('parent').where($parent.modified, '>', 'modified')
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I am afraid about ballooning complexity of ivm
. Maybe I will become as confident as you. Right now I want to be cautious about feature inclusion. Let's get subqueries and limit working well end to end and revisit.
} | ||
} | ||
|
||
function matchesPredicate(lhs: Value, op: SimpleOperator, rhs: Value): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I guess it depends on the diff between closure/functioncall overhead and switch overhead on the string. I assumed the strings would be interned, but who knows. I guess we'll see it in a profile if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you can do one of your microbenchmarks to answer it concretely.
Or even a connection and begin concurrent per pipeline 😅. Or a
Yeah, they're basically identical and I think we'll want to be able to use views as sources one day. |
* This data is kept in sorted order as downstream pipelines will | ||
* always expect the data they receive from `pull` to be in sorted order. | ||
*/ | ||
export class MemoryInput implements Input { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input
is a weird name to me given the input retains everything that has be input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it might be overly clever. I kinda like it. Let's sit with it for a bit.
readonly #input: Input; | ||
readonly #predicate: Filter; | ||
|
||
#output: Output | null = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arv is always telling me to use undefined
.
#output?: Output | undefined;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if @arv would still feel this way here. I think he's talking about cases where the fields are fields of objects that people are construction and passing around as literals. In those cases it is convenient to type the field as {foo?: Foo | undefined}
because caller can just say {}
.
The question of whether to use (missing field), undefined
, or null
in JS: a question as old as time. I feel like here it makes sense to have the field always present (no ?
) since it is behaving like a class private field, not a member of an option bag. And since it's always present I don't like to use undefined
– I prefer to treat the case of missing field and present field with undefined
value as the same thing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with null
her but when/if this gets used as a param to something else it will most likely make more sense as undefined
.
There are no hard rules (except no megamorphic classes!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no hard rules
twitch twitch
*#applyChanges(changes: Iterable<Change>) { | ||
for (const change of changes) { | ||
if (change.type === 'add') { | ||
this.#tree = this.#tree.with(change.row, undefined); | ||
yield change; | ||
} else { | ||
yield change; | ||
this.#tree = this.#tree.without(change.row); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we can no longer do this as soon as a source has more than one pipeline attached to it. The reason being that the tree will no longer be in the correct state when the second pipeline starts.
Relevant for @grgbkr & https://www.notion.so/replicache/IVM-Nextsteps-a09fe0c5fc6c4beea117d7e6908790b2?p=5fee4873f8664f8189b1bf8954fc604e&pm=s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tree is immutable so we can have every historical version of it... Maybe there's something there to be leveraged. Another idea is to run each pipeline one step at a time in turn.
for (const pipe of pipelines) {
pipe.step();
}
and have a fork
operator directly attached to a source that can send the same value down all paths.
Ignore that second idea. Filters break it immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, aware of these. @grgbkr when you get to this let me know and we can talk about it.
OK so I spent the weekend playing with this and trying different things on for size. As I suspected, this ended up pretty close to #2109, just with some simplifications and renaming:
data.ts
). I think this sets us up nicely for multi-column IDs from the beginning.TreeIterator
toTreeDiff
after trying on a lot of names. I really struggled with keeping what was going on in my head conceptually and naming this "diff" helped me to keep it straight.ChangeStream
helper class that enforces the invariant that an iterator/stream can only be consumed once. It also provides a sanity check in case a push diff is not completely consumed.DifferenceStream
. I don't understand how the forking that it did was possible to implement correctly withpull()
and only-once iteration. Also it just wasn't doing much and I was on a quest to remove everything until needed. Hopefully it's not proven needed tomorrow. (In seriousness, I think we'll have to have a first-class concept of Fork that is more principled).Operator
just beInput
andOutput
. So cute.Source
andView
. They look like just special inputs and outputs to me.I also spent a great deal of time thinking about how push and pull flow up and down the pipeline and I feel pretty good about it now. I have a fairly complete sketch on paper. It's hard, but it does seem like it will work and I can't think of any way it could be better.
I had a lot of fun working this all through. Thanks for letting me take a spin at it and put it "in my own words" so to speak. I think this has made this whole thing a lot more concrete in my head.