-
Notifications
You must be signed in to change notification settings - Fork 96
chore(zero-cache): introduce ZeroEvents for surfacing real-time events for ops/observability #4736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Branch | darkgnotic/zero-events |
Testbed | Linux |
🚨 2 Alerts
Benchmark | Measure Units | View | Benchmark Result (Result Δ%) | Lower Boundary (Limit %) |
---|---|---|---|---|
src/client/zero.bench.ts > pk compare > pk = N | Throughput operations / second (ops/s) x 1e3 | 📈 plot 🚷 threshold 🚨 alert (🔔) | 21.51 ops/s x 1e3(-24.64%)Baseline: 28.54 ops/s x 1e3 | 24.50 ops/s x 1e3 (113.90%) |
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | Throughput operations / second (ops/s) x 1e3 | 📈 plot 🚷 threshold 🚨 alert (🔔) | 2.35 ops/s x 1e3(-8.12%)Baseline: 2.56 ops/s x 1e3 | 2.47 ops/s x 1e3 (105.16%) |
Click to view all benchmark results
Benchmark | Throughput | Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%) | Lower Boundary operations / second (ops/s) x 1e3 (Limit %) |
---|---|---|---|
src/client/custom.bench.ts > big schema | 📈 view plot 🚷 view threshold | 366.99 ops/s x 1e3(-4.54%)Baseline: 384.45 ops/s x 1e3 | 353.76 ops/s x 1e3 (96.40%) |
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers) | 📈 view plot 🚷 view threshold | 1.61 ops/s x 1e3(-1.04%)Baseline: 1.63 ops/s x 1e3 | 1.59 ops/s x 1e3 (98.58%) |
src/client/zero.bench.ts > pk compare > pk = N | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 21.51 ops/s x 1e3(-24.64%)Baseline: 28.54 ops/s x 1e3 | 24.50 ops/s x 1e3 (113.90%) |
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers) | 📈 view plot 🚷 view threshold 🚨 view alert (🔔) | 2.35 ops/s x 1e3(-8.12%)Baseline: 2.56 ops/s x 1e3 | 2.47 ops/s x 1e3 (105.16%) |
|
Branch | darkgnotic/zero-events |
Testbed | Linux |
Click to view all benchmark results
Benchmark | File Size | Benchmark Result kilobytes (KB) (Result Δ%) | Upper Boundary kilobytes (KB) (Limit %) |
---|---|---|---|
zero-package.tgz | 📈 view plot 🚷 view threshold | 1,280.67 KB(+0.37%)Baseline: 1,276.00 KB | 1,301.52 KB (98.40%) |
zero.js | 📈 view plot 🚷 view threshold | 207.51 KB(0.00%)Baseline: 207.51 KB | 211.66 KB (98.04%) |
zero.js.br | 📈 view plot 🚷 view threshold | 58.11 KB(0.00%)Baseline: 58.11 KB | 59.28 KB (98.04%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"version": "0.0.0", | ||
"private": true, | ||
"type": "module", | ||
"module": "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the lack of context. i which we could have comments in package.json
eventually i intend to publish this package alone as something like rocicorp/zero-events
, so that production / operational code (i.e. cloudzero) can import just the ZeroEvent interfaces and not full zero package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but "module": "true",
is not part of package.json spec as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, then it looks like I just cargo-culted it from another (mistaken) package.json 😅
}, | ||
"devDependencies": { | ||
"@rocicorp/eslint-config": "^0.7.0", | ||
"@rocicorp/prettier-config": "^0.3.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vitest?
It does look like this package has no test so you should remove the test scripts
export const replicationStatusEventSchema = statusEventSchema.extend({ | ||
type: v | ||
.literal('zero/events/status/replication/v1') | ||
.assert(v => v.startsWith(STATUS_EVENT_PREFIX)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this assert does here. Seems like it is more of a "safety check". You can probably achieve the same thing using typescript satisfies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I got something to work using template literals, so we can avoid the somewhat cryptic use of assert.
void publishFn(lc, event); | ||
} | ||
|
||
export async function publishCriticalEvent<E extends ZeroEvent>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just the same as publishEvent? What is the usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most events are published fire-and-forget in the steady state.
But for errors, particularly when the server is going to be shut down, I expose this method that is intended to be await
ed to make sure that the error event gets published before the server shuts down. This is to guarantee that the error makes it out of the server before it dies.
return errorDetails; | ||
} | ||
|
||
const pathUnused = {push: () => {}, pop: () => {}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path unused can also be an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm ... can you clarify? I'm looking at this method:
mono/packages/shared/src/json.ts
Line 184 in 842c807
export function isJSONValue(v: unknown, path: Path): v is JSONValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of isJSONValue usually takes an array. Array implements Path :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, I see. Roger.
component: 'replication', | ||
stage: 'Initializing', | ||
status: 'OK', | ||
description: /Copying \d+ upstream tables at version \w+/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, you can use RegExps in these
type: 'zero/events/status/replication/v1', | ||
component: 'replication', | ||
stage: 'Initializing', | ||
status: 'OK', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this have the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first status we publish when we start initial sync, before we've touched upstream at all. I couldn't think of any useful information to embellish here ... seems like "Initializing" is enough? It will be followed up with the "Copying ... " event pretty quickly. 🤷
lc, | ||
'Indexing', | ||
`Creating ${indexes.length} indexes`, | ||
5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 5000? and how is it exposed/tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the interval between continuous updates. So during initial sync, events will be published every 5 seconds to surface the latest state of initial sync. In the PG case, it's mainly the replica size that will change.
It isn't tested. 🤫
publishEvent, | ||
} from '../../observability/events.ts'; | ||
|
||
const byKeys = ([a]: [string, unknown], [b]: [string, unknown]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is hot but for hot code do not destructure arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super hot ... events should be infrequent. But I'll remember this. Thank you! 🙏
interval = 0, | ||
): this { | ||
this.stop(); | ||
publishEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make more sense to make publishEvent a property that is passed in instead of overriding it for tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be cleaner ... bit it would be a lot of plumbing to get to this object from the top level of the test. i chose the lazy / less invasive route. i think it's "okay" because tests are run serially per process ... but please correct me if i'm mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @arv!
void publishFn(lc, event); | ||
} | ||
|
||
export async function publishCriticalEvent<E extends ZeroEvent>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most events are published fire-and-forget in the steady state.
But for errors, particularly when the server is going to be shut down, I expose this method that is intended to be await
ed to make sure that the error event gets published before the server shuts down. This is to guarantee that the error makes it out of the server before it dies.
return errorDetails; | ||
} | ||
|
||
const pathUnused = {push: () => {}, pop: () => {}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm ... can you clarify? I'm looking at this method:
mono/packages/shared/src/json.ts
Line 184 in 842c807
export function isJSONValue(v: unknown, path: Path): v is JSONValue { |
type: 'zero/events/status/replication/v1', | ||
component: 'replication', | ||
stage: 'Initializing', | ||
status: 'OK', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first status we publish when we start initial sync, before we've touched upstream at all. I couldn't think of any useful information to embellish here ... seems like "Initializing" is enough? It will be followed up with the "Copying ... " event pretty quickly. 🤷
lc, | ||
'Indexing', | ||
`Creating ${indexes.length} indexes`, | ||
5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the interval between continuous updates. So during initial sync, events will be published every 5 seconds to surface the latest state of initial sync. In the PG case, it's mainly the replica size that will change.
It isn't tested. 🤫
publishEvent, | ||
} from '../../observability/events.ts'; | ||
|
||
const byKeys = ([a]: [string, unknown], [b]: [string, unknown]) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super hot ... events should be infrequent. But I'll remember this. Thank you! 🙏
interval = 0, | ||
): this { | ||
this.stop(); | ||
publishEvent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be cleaner ... bit it would be a lot of plumbing to get to this object from the top level of the test. i chose the lazy / less invasive route. i think it's "okay" because tests are run serially per process ... but please correct me if i'm mistaken.
"version": "0.0.0", | ||
"private": true, | ||
"type": "module", | ||
"module": "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the lack of context. i which we could have comments in package.json
eventually i intend to publish this package alone as something like rocicorp/zero-events
, so that production / operational code (i.e. cloudzero) can import just the ZeroEvent interfaces and not full zero package.
Introduce ZeroEvents, a channel for surfacing realtime "events" for the purposes of production ops and observability.
As a monitoring signal, events differ from (1) metrics, which are periodic and limited to numeric values, and (2) logs, which are textual and generally have a multi-minute processing delay. Events, on the other hand, are designed for low latency delivery / processing with structured data payloads.
ZeroEvents are published using the Cloud Events specification, with configuration designed to facilitate integration with knative sink bindings. However, any CloudEvent routing framework can be used to route and handle the Cloud Events.
The first specific ZeroEvent type is used for publishing the status of replication. ReplicationStatusEvents are emitted:
The events themselves contain the replicated schema data, which facilitates verification by the operator that the expected tables and columns are being replicated.