-
Notifications
You must be signed in to change notification settings - Fork 96
fix(zero-cache): improve IVM time-slicing to avoid I/O starvation #4989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Branch | darkgnotic/avoid-io-starvation |
Testbed | self-hosted |
Click to view all benchmark results
Benchmark | Throughput | Benchmark Result operations / second (ops/s) (Result Δ%) | Lower Boundary operations / second (ops/s) (Limit %) |
---|---|---|---|
zpg: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 912.44 ops/s(-9.09%)Baseline: 1,003.63 ops/s | 849.21 ops/s (93.07%) |
zpg: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 1,111.54 ops/s(+3.70%)Baseline: 1,071.89 ops/s | 969.66 ops/s (87.24%) |
zpg: (table scan) select * from album | 📈 view plot 🚷 view threshold | 706.59 ops/s(-0.86%)Baseline: 712.73 ops/s | 609.52 ops/s (86.26%) |
zpg: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 890.98 ops/s(-3.51%)Baseline: 923.36 ops/s | 837.47 ops/s (93.99%) |
zpg: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 728.22 ops/s(-4.30%)Baseline: 760.91 ops/s | 676.93 ops/s (92.96%) |
zpg: all playlists | 📈 view plot 🚷 view threshold | 5.57 ops/s(+0.75%)Baseline: 5.53 ops/s | 5.37 ops/s (96.36%) |
zpg: scan with one depth related | 📈 view plot 🚷 view threshold | 428.75 ops/s(-0.01%)Baseline: 428.80 ops/s | 410.74 ops/s (95.80%) |
zql: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 115,312.11 ops/s(-6.36%)Baseline: 123,145.04 ops/s | 106,318.44 ops/s (92.20%) |
zql: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 2,289.03 ops/s(+5.47%)Baseline: 2,170.28 ops/s | 1,998.47 ops/s (87.31%) |
zql: (table scan) select * from album | 📈 view plot 🚷 view threshold | 6,726.54 ops/s(+1.59%)Baseline: 6,621.51 ops/s | 6,279.81 ops/s (93.36%) |
zql: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 60,973.26 ops/s(+8.36%)Baseline: 56,267.43 ops/s | 45,863.32 ops/s (75.22%) |
zql: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 13,461.00 ops/s(+9.02%)Baseline: 12,347.46 ops/s | 11,003.57 ops/s (81.74%) |
zql: all playlists | 📈 view plot 🚷 view threshold | 4.38 ops/s(+2.96%)Baseline: 4.26 ops/s | 4.01 ops/s (91.51%) |
zql: edit for limited query, inside the bound | 📈 view plot 🚷 view threshold | 231,134.68 ops/s(+4.14%)Baseline: 221,942.63 ops/s | 209,047.60 ops/s (90.44%) |
zql: edit for limited query, outside the bound | 📈 view plot 🚷 view threshold | 237,308.16 ops/s(+7.22%)Baseline: 221,335.40 ops/s | 188,870.79 ops/s (79.59%) |
zql: push into limited query, inside the bound | 📈 view plot 🚷 view threshold | 110,745.89 ops/s(+0.76%)Baseline: 109,913.92 ops/s | 106,918.94 ops/s (96.54%) |
zql: push into limited query, outside the bound | 📈 view plot 🚷 view threshold | 402,209.91 ops/s(-5.57%)Baseline: 425,937.38 ops/s | 389,870.69 ops/s (96.93%) |
zql: push into unlimited query | 📈 view plot 🚷 view threshold | 350,928.90 ops/s(+3.13%)Baseline: 340,270.20 ops/s | 316,378.96 ops/s (90.15%) |
zql: scan with one depth related | 📈 view plot 🚷 view threshold | 505.94 ops/s(+6.24%)Baseline: 476.25 ops/s | 438.74 ops/s (86.72%) |
zqlite: (pk lookup) select * from track where id = 3163 | 📈 view plot 🚷 view threshold | 42,927.62 ops/s(+0.42%)Baseline: 42,746.99 ops/s | 40,811.21 ops/s (95.07%) |
zqlite: (secondary index lookup) select * from track where album_id = 248 | 📈 view plot 🚷 view threshold | 10,188.09 ops/s(+0.58%)Baseline: 10,129.00 ops/s | 9,849.35 ops/s (96.68%) |
zqlite: (table scan) select * from album | 📈 view plot 🚷 view threshold | 1,364.97 ops/s(+1.87%)Baseline: 1,339.94 ops/s | 1,252.98 ops/s (91.80%) |
zqlite: OR with empty branch and limit | 📈 view plot 🚷 view threshold | 17,909.44 ops/s(-0.33%)Baseline: 17,969.34 ops/s | 17,365.29 ops/s (96.96%) |
zqlite: OR with empty branch and limit with exists | 📈 view plot 🚷 view threshold | 5,663.73 ops/s(+3.66%)Baseline: 5,463.62 ops/s | 5,154.76 ops/s (91.01%) |
zqlite: all playlists | 📈 view plot 🚷 view threshold | 1.43 ops/s(+2.55%)Baseline: 1.40 ops/s | 1.32 ops/s (92.21%) |
zqlite: edit for limited query, inside the bound | 📈 view plot 🚷 view threshold | 121,757.37 ops/s(+2.53%)Baseline: 118,747.75 ops/s | 111,557.94 ops/s (91.62%) |
zqlite: edit for limited query, outside the bound | 📈 view plot 🚷 view threshold | 124,235.44 ops/s(+2.86%)Baseline: 120,784.83 ops/s | 113,264.49 ops/s (91.17%) |
zqlite: push into limited query, inside the bound | 📈 view plot 🚷 view threshold | 3,811.57 ops/s(-0.59%)Baseline: 3,834.36 ops/s | 3,690.43 ops/s (96.82%) |
zqlite: push into limited query, outside the bound | 📈 view plot 🚷 view threshold | 145,646.49 ops/s(+0.82%)Baseline: 144,454.98 ops/s | 137,132.93 ops/s (94.15%) |
zqlite: push into unlimited query | 📈 view plot 🚷 view threshold | 130,368.07 ops/s(+3.98%)Baseline: 125,373.52 ops/s | 113,544.68 ops/s (87.10%) |
zqlite: scan with one depth related | 📈 view plot 🚷 view threshold | 163.74 ops/s(+0.09%)Baseline: 163.59 ops/s | 153.14 ops/s (93.52%) |
|
Branch | darkgnotic/avoid-io-starvation |
Testbed | Linux |
Click to view all benchmark results
Benchmark | File Size | Benchmark Result kilobytes (KB) (Result Δ%) | Upper Boundary kilobytes (KB) (Limit %) |
---|---|---|---|
zero-package.tgz | 📈 view plot 🚷 view threshold | 1,371.10 KB(+0.02%)Baseline: 1,370.76 KB | 1,398.18 KB (98.06%) |
zero.js | 📈 view plot 🚷 view threshold | 225.67 KB(0.00%)Baseline: 225.67 KB | 230.19 KB (98.04%) |
zero.js.br | 📈 view plot 🚷 view threshold | 63.05 KB(0.00%)Baseline: 63.05 KB | 64.31 KB (98.04%) |
|
Branch | darkgnotic/avoid-io-starvation |
Testbed | self-hosted |
Click to view all benchmark results
Benchmark | Throughput | Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%) | Lower Boundary operations / second (ops/s) x 1e3 (Limit %) |
---|---|---|---|
src/client/custom.bench.ts > big schema | 📈 view plot 🚷 view threshold | 908.58 ops/s x 1e3(+10.62%)Baseline: 821.33 ops/s x 1e3 | 655.20 ops/s x 1e3 (72.11%) |
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 2.86 ops/s x 1e3(+2.13%)Baseline: 2.80 ops/s x 1e3 | 2.68 ops/s x 1e3 (93.71%) |
src/client/zero.bench.ts > pk compare > pk = N | 📈 view plot 🚷 view threshold | 44.18 ops/s x 1e3(+1.05%)Baseline: 43.72 ops/s x 1e3 | 41.95 ops/s x 1e3 (94.94%) |
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 4.15 ops/s x 1e3(-0.14%)Baseline: 4.16 ops/s x 1e3 | 3.99 ops/s x 1e3 (95.99%) |
q.id, | ||
q.ast, | ||
timer.start(), | ||
timer.startWithoutYielding(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have cleaner / clearer if I was able to await timer.start()
here like I do in the other places, but that would then require this generator (and the other generators that iterate over Database results) to be AsyncIterable
s instead of Iterable
s.
Would that have been expensive, i.e. to
- wrap the sqlite Iterables as AsyncIterables and
- add an
await
to every iterator call in the#processChanges()
loop
when most of the time the iterator fetch is synchronous, or is that essentially negligible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are fine making this an AsyncIterator. The micro task loop does add some overhead but we are already using yield so I don't think it should be that bad.
Benchmark it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR I wrote a benchmark that iterates through 10,000 rows in a simple SELECT statement.
describe('sqlite iteration variations', () => {
const db = new Database(
createSilentLogContext(),
'/tmp/zbugs-sync-replica.db',
);
const select = db.prepare('SELECT * FROM comment LIMIT 10000');
bench('synchronous', () => {
for (const _ of select.iterate()) {
}
});
bench('synchronous with for await', async () => {
for await (const _ of select.iterate()) {
}
});
bench('synchronous with wrapped async iterator', async () => {
const asyncIterator = (async function* () {
for (const item of select.iterate()) {
yield item;
}
})();
for await (const _ of asyncIterator) {
}
});
});
The plain synchronous iteration is reliably ~10% faster.
✓ zero-cache/no-pg bench/sqlite-iteration.bench.ts > sqlite iteration variations 1858ms
name hz min max mean p75 p99 p995 p999 rme samples
· synchronous 108.03 8.6723 9.8659 9.2564 9.3097 9.8659 9.8659 9.8659 ±0.59% 55
· synchronous with for await 98.1465 9.6024 10.6957 10.1888 10.2620 10.6957 10.6957 10.6957 ±0.50% 50
· synchronous with wrapped async iterator 98.6003 9.5159 10.6587 10.1420 10.2224 10.6587 10.6587 10.6587 ±0.50% 50
BENCH Summary
zero-cache/no-pg synchronous - bench/sqlite-iteration.bench.ts > sqlite iteration variations
1.10x faster than synchronous with wrapped async iterator
1.10x faster than synchronous with for await
Granted, the overhead may be less significant depending on the cost of the logic inside of the loop, but the cost of the additional await
s and Promises over a plain table iteration is not negligible.
So I'll keep the code as it is (i.e. with sync iterators for row iteration and the occasional await
for yielding).
omg you found it!!! huge!! |
LGTM |
Improve IVM time-slicing to run slices in separate iterations of the Node JS event loop, thus allowing I/O processing to happen between slices and preventing connection failures due to unresponsive pings.
The simulator illustrates the behavior of the previous and new time-slicing implementations based on a "pipeline" which:
Previous implementation
The previous
setTimeout()
-based implementation, while achieving fairness across pipelines, was problematic in that the slices of all pipelines were run in a single iteration of the event loop (specifically, the pending callbacks phase), thereby incurring an I/O processing delay proportional to the number of pipelines.For example, with 20 pipelines (i.e. ViewSyncers) in the process, the ping delay goes up to 20 x 500ms = 10sec.
New implementation
The new time slicing implementation leverages a queue (implemented with a
Lock
) to:setImmediate()
to run logic after I/O, andsetImmediate()
in that phase.As explained in the specification of setImmediate():
The resulting behavior achieves the same fairness and overall completion time, but allows I/O to be processed between slices, capping the ping delay to the duration of a single time slice:
Miscellaneous
Also consolidated the timer and yielding code, and made it such that we yield before the first time slice (e.g. so that we don't run into the same problem when a bunch of ViewSyncers start processing a long advancement).