Skip to content

Conversation

grgbkr
Copy link
Contributor

@grgbkr grgbkr commented May 6, 2023

Problem

Monday's revalidate connections cron is failing every time due to it exceeding memory limits. This is because the full list of connections is too large to fit in memory.

Solution

Refactor the two places where the AuthDO lists all connections (_authRevalidateConnections and _authInvalidateAll), to instead incrementally load the connections.

For _authRevalidateConnections we need to group the connections by roomID, since we revalidate connections a room at a time by fetching the room's current connections. To do this incrementally without having to iterate the entire list of connections incrementally for each room necessitates creating a new index by room id over the connections. To handle the creation of this index for existings AuthDO's, an AuthDO Storage Schema Version is introduced and stored in the AuthDO's storage at key auth_do_storage_schema_version. The AuthDO ensures the storage schema version is upgraded to the current version before processing any fetch.

Fixes #487

@vercel
Copy link

vercel bot commented May 6, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
replicache-docs-trunk ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 8, 2023 5:21pm

@grgbkr grgbkr requested review from aboodman and arv May 6, 2023 19:38
@grgbkr grgbkr marked this pull request as ready for review May 6, 2023 19:39
Copy link
Contributor

@aboodman aboodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be such a common need that it might be useful to just implement a general purpose index facility. I suppose it could be abstracted out later though.

}

const CONNECTION_KEY_PREFIX = 'connection/';
const CONNECTION_ROOM_INDEX_PREFIX = 'connection_room/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like connections_by_room would be a more immediately obvious name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. done.

return `${CONNECTION_KEY_PREFIX}${encodeURIComponent(userID)}/`;
}

function connectionKeyToConnectionRoomIndexString(key: ConnectionKey): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you purposely end it with /?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just keeping this consistent with what I did for the connectionKeyToString and I can't remember why I ended that with slash.

}

function connectionKeyToConnectionRoomIndexString(key: ConnectionKey): string {
return `${CONNECTION_ROOM_INDEX_PREFIX}${encodeURIComponent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but you could use the below function inside this function to ensure they remain in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I also reworked this a bit so the index key reuses the connection key. it makes the index key longer, but will help ensure things keep in sync, and its closer to the approach you would take with a general index utility.

@grgbkr grgbkr changed the title fix: enable connection revalidationt to scale to more connections than can fit in memory at once fix: enable connection revalidation to scale to more connections than can fit in memory at once May 7, 2023
connectTimestamp: Date.now(),
};
await this._state.storage.put(connectionKey, connectionRecord);
recordConnection(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is getting so long, it's worth thinking about how to break it up for readability. Obviously pieces can be broken up into method helpers, but often doing that doesn't help too much unless the inputs and outputs are very clear. Is it possible/useful to factor out a bare connect() function or a set of bare function helpers? I find that makes things much easier to reason about because the inputs/outputs are reduced to just the parameters and return value vs state shared with the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at this as a follow up. This has gotten to be a beast of a function (though this particular change doesn't add to it).

return connectionKeys;
}

async function* createConnectionsByRoomGenerator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: consider generateConnectionsByRoom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Generator is an implementation strategy for an iterator. No need to include that in the name.

@grgbkr grgbkr merged commit 0a171b1 into main May 8, 2023
@grgbkr grgbkr deleted the grgbkr/auth-do-oom branch May 8, 2023 19:43
Comment on lines +872 to +881
while (true) {
const nextRoomListResult = await storage.list({
startAfter: lastKey,
prefix: CONNECTIONS_BY_ROOM_INDEX_PREFIX,
limit: 1,
});
if (nextRoomListResult.size === 0) {
return;
}
const firstRoomIndexString: string = [...nextRoomListResult.keys()][0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is safe as next(). list returns a Map and the map iterators have no return methods so there is no cleanup behavior involved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just iterator over all (one) of them. Then if we wanted to change the limit to 5 or whatever, things would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const connectionsListResult = await storage.list({
startAfter: lastKey,
prefix: CONNECTION_KEY_PREFIX,
limit: 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use a comment... same with createConnectionsByRoomGenerator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const connectionRoomIndexString =
connectionKeyToConnectionRoomIndexString(connectionKey);
// no await to ensure the two puts are coalesced and done atomically
void storage.put(connectionKeyString, record);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.catch would be better.

Or

storage.put({
 [connectionKeyString]: record,
 [connectionRoomIndexString]: {},
}).catch(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to multikey, great idea

Comment on lines +943 to +944
void storage.delete(connectionKeyString);
void storage.delete(connectionRoomIndexString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete also has a multi key version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to multi key

createConnectionKeyStringsGenerator(storage);
for await (const connectionKeyString of connectionKeyStringsGenerator) {
const connectionKey = must(connectionKeyFromString(connectionKeyString));
void storage.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no await?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for coallescing all of them to make it atomic. However, that can actually lead to OOM, as the coallesced writes can exceed memory limits. Added the await, which means the index building is not atomic, and can fail part of the way through... which is ok. We only update thee schema version meta to the new version if the migration completes. If it fails next time server starts we will retry migration.

Copy link
Contributor

@arv arv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

grgbkr added a commit that referenced this pull request May 19, 2023
Update AuthDO schema migration logic to deal with down-migrates (i.e. dealing with rolling back a deploy that does a migration).  

Also address arv's feedback from #491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monday error: "Durable Object exceeded its memory limit"

3 participants