-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add Support for UUID v7 Generation for User IDs #41920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Keston Crandall <[email protected]>
pedroigor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to catch up on UUID v7. Is your implementation based on https://www.rfc-editor.org/rfc/rfc9562.html?
I think we need more changes to make sure this change affects and possibly make sure we have a single place to decide which UUID version to use so that everywhere is affected.
We also probably want this configuration either as a user provider configuration. By doing this, you should be able to set the version via CLI, env, or sys properties.
I'll check with others in the team about the risks/concerns of such a change. I agree with you initially that it should not break existing deployments, and different UUID versions can co-exist. But I'm not 100% sure.
|
I'm pretty sure its that RFC but its the version included in Postgres 18 I am unfamiliar with how Keycloak is architected so I tried to make minimal changes to the repo, but another option would be to change the env variable to something like: This would be more portable to future versions that may come out in next 10 years, right now there is only v4 and v7 that really matter. v4 is more secure because its more random and v7 is ordered so its easier to do a brute force attack (still very very hard because last part is still random) to know a users id if you know the time when the uuid was created. Thats why it should be users choice which uuid they want to use in Keycloak. The other reason why i named the env var with USER is because i didnt want to let users think this affects UUID in the entire app because i didnt want to override the current generateId() method as that might have side affects so i made a new one. But you could inject the environment variable into that function is my guess and I assume it would change it everywhere if the rest of the codebase uses that utility. |
It is a sensitive change. Perhaps we want to pick a version when creating a database. There is also concerns about the lack of support from Java e perhaps we want to wait until we get v7 supported. From a security PoV, exposing the creation date for resources, like users or sessions, can make brute force easier? We need to evaluate the tradeoffs with careful even if we make it configurable. By using it everywhere, I meant when persisting data. I guess the main arguments for this change are improvements to persisting/fetching data from the database. Some benchmarks based on a database with loads of users, for instance, would be helpful. |
|
From a user point of view the most important thing is the user id so you can use that as your primary key in your db so the auth layer doesn’t need to do a SELECT to match an internal uuid with keycloak. Every other resource in keycloak doesn’t matter to me personally. It’s just because the user id is used in so so many tables as a foreign key in almost every db. Basically more bytes in uuid are based on the timestamp in v7 vs v4. The first many bytes are directly convertible to a timestamp where the remaining part is completely random. Uuid4 most of the bytes are random and not based on time. I think the risks are low especially with a user id. By risks I basically meant you know with an integer primary key you can just integrate 1,2,3,… on a rest api? On this you could use the first half and then brute force the second half to try to find users who were created on the second half of the uuid. It would still take forever. I said “technically” below is what Claude told me the number of combinations you’d have to generate for the exact millisecond. “ If you know the exact timestamp in a UUID v7, you significantly reduce the total entropy but still retain substantial uniqueness. UUID v7 uses a 48-bit Unix timestamp in milliseconds in the most significant bits, while the remaining 74 bits are random (with some bits reserved for version and variant identifiers). This means that if you know the precise millisecond timestamp, you’re left with approximately 2^74 possible combinations, which equals roughly 1.9 × 10^22 unique values. Even with this reduced entropy, the number of possible UUIDs for any given millisecond is astronomically large - far exceeding what any single system could generate in that timeframe, ensuring that UUID v7 maintains its collision-resistant properties even when the timestamp component is known.” https://github.com/equenum/postgre_uuid_performance PostgreSQL UUID v7 vs v4 Performance ComparisonPerformance Metrics Table
Key Performance CharacteristicsInsert Performance
Storage Efficiency
Query Performance
Memory Usage
Test Environment DetailsHardware Specifications:
Test Parameters:
Performance Scaling Observations
Sources
Note: PostgreSQL 18 will include native UUID v7 support via the |
|
With your comment at Java support build in, I don’t think it’s a big of a deal the algorithm to generate it is pretty small. Can just swap out the util for official version if it ever comes out. Don’t think it’s worth waiting for an official release in Java. Here is a lib, but I think it’s better just to make a simple util for something so simple. C# got it last year with dotnet 9, but I’m not sure if Java stays up to date as fast as dotnet does. Before the offial uuidv7 people were using ulid which is similar, uuidv7 is just the official version trying to solve the same thing that ulid has been around for since like 2016. They all are just random strings with different structures so usually pretty easy to interchange. They all fit into the same data type. |
stianst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't add UUIDv7 support until it's an accepted standard, at the moment it's a proposed standard. Ideally we should also wait for Java to support it, but if there's enough demand we can always do it earlier.
Bear in mind also that UUIDv7 are longer than the current UUIDs we generate, this affects UI, tokens, etc.
We'd probably also want to consider using UUIDv7 everywhere, or at least for one type of entities like users. As in this PR it is only using UUIDv7 when partially importing users into an existing realm, which makes no sense to only have it there.
Finally, we don't use environment variables to expose configuration options. Not sure how we would want to make this configurable, but not through environment variables that's for sure.
|
Maybe what would make sense is to add an SPI/provider for ID generation? Default provider can by like we do today, and we could add an experimental one for UUIDv7? That makes it possible for folks to use custom generators, and also provides a nice way to configure it, including the ability to have experimental, preview, providers. |
UUIDv7 is the same length as UUIDv4. they are completely interchangeable. you can insert a v4 into a database table column with all v7s and the reverse. they are both 128-bit. the only difference is how the bits are generated |
You're right, my bad. I just looked at an example UUIDv7 and thought that's long(er) without actually checking ;) |
|
An SPI makes more sense for now, but we still need to look at the impact of changing the format and usage when designing its contract. @kcrandall Wdyt? |
|
Ya I think that’s fine. Especially if keycloak was shipped with a way to configure a few options such as uuid4 and uuid7 that you could configure easily and then the option to provide your own as well. might be a bit over engineering something that is pretty simple. The only thing I question is will it ever be used for anything other than uuid versions? Like I could see that being more useful if you could change user ids lots of different things like ints or other data types. I do feel like just have a configuration for uuid version(3,4,7) and marking it as experimental for now might be more straightforward unless you see a future where this would have more options then just uuid versions. Also I’m not that familiar with how that is done in Java. In dotnet you would make an interface and register with DI IService container with extension methods. I assume it’s something similar in Java. |
|
SPI/Provider is really not over-engineering it, on the contrary really as there is no need to introduce config options, etc. I would imagine we'd start really simple and only support generating IDs and use the same provider for all IDs. It really is 10-15 minutes of effort to do. What might make it a bit more complicated is that KeycloakModelUtils.generateId doesn't have access to the KeycloakSession, which would require a bit more fiddling with the code to be able to obtain the provider, etc. Not particularly hard though. What would make things more complicated is when we want to use different ID formats for different entities (say users have UUIDv7, while tokens or clients use something else). |
|
@kcrandall Perhaps you can start looking at this branch https://github.com/pedroigor/keycloak/tree/uuid-spi. It provides the bare minimum to introduce a You can follow the same idea to implement a UUID v7 provider. To set which provider the server should use at runtime, you can set a configuration option like In that branch, I've also updated the There are quite a few things yet to look at but I hope this will give you something to start with. As Stian said, we might want to have UUID versions set on a per-resource-type basis (e.g.: users, realm, clients, tokens, sessions, etc) and a mechanism to decide which provider to use based on the resource type. However, if we can start with users, perhaps we can already come up with something that you and others can evaluate. |
|
@pedroigor that branch looks very good, I was missunderstanding how this flow would work in the project. I think starting with users is the best way to go. Its the use case that 99% of people would care about the most. I think extending other objects would be something to do as people ask for it. It looks like you added the uuidv7 provider so this looks like its good to go? |
It depends on how we want to use this SPI with different resource types. If we want to start with users while at the same time be prepared to expand to other realm resource types (clients, groups, sessions, etc), tokens (not sure if it makes sense), etc, we need more work. I would like to have an issue open for discussion where we provide at least an overview of where UUIDs are used, so that we can build a plan about how we are going to refactor the code to allow changing the UUID format. It should help to get more insights in the design of this SPI and how we are going to configure it. Another thing we need to consider is testing. And I would like to run some benchmarks using the But yeah, if we want to start with users. I think that should be enough. I also need to check with @stianst and others how much we should prioritize it. |
|
Why start with users and not just do all UUIDs? Simplest approach is just to use UUIDv7 everywhere, they have the same format, so shouldn't really matter, and if it helps databases to deal with them more efficient then that's a win for everything? |
|
I would have no problem with doing it everywhere. Whatever is easier |
|
@pedroigor I think we should just start with it everywhere, to see if that works fine and as you said if we can get any perf benefits from it. It would be the non-default provider, and marked as experimental for now I'd say/ |
By everywhere, we are talking about realm resource types like:
But not:
For tokens, I don't think it makes much sense because we do not persist them. For sessions, using UUIDv7 can potentially help with session fixation|enumeration|hijack? That is why I would like to make it clearer the places where this SPI would apply and start with users to evaluate the benefits of using it to later expand to others. But if we can at least decide on the resources we want to allow to change the UUID, I'm also open to having it "everywhere". |
|
By everywhere I mean everywhere. Tokens, sessions, events, etc. are (or can be) stored in the database as well. UUIDv7 still has sufficient randomness, and are unique, we also don't ever rely on just a UUID to prove ownership of anything like a token or session. We use signatures for that. I'd be more than open to have the SPI support different types of IDs in the future, perhaps something like However, the first thing we should really do is just have all IDs replaced by UUID v7 with an experimental provider, so some benchmarking can be done easily to see if there is any real value. |
|
@stianst - I disagree with your statement as described in #38663
The requirements of NIST and ANSSI differ here: In some places at least 128 bits or randomness are required, and UUIDv7 only has 74 random bits while the others are a timestamp. So UUIDv7 is not a fit for those places. |
Could you elaborate? We shouldn't be using UUIDs in places where it's being used as more than an identifier. Take tokens for instances it's the signature that provides the security, not the jti.
|
|
After discussing with @ahus1 I think this needs a bit more consideration, and review of the codebase before we introduce something like this |
+1 |
Add Support for UUID v7 Generation for User IDs
Summary
This PR introduces optional UUID v7 generation for user creation in Keycloak, providing significant database performance improvements while maintaining full backward compatibility through an environment variable configuration.
Fixes #30459
What changed?
generateIdv7()method toKeycloakModelUtilsfor UUID v7 generationUsersPartialImport.create()method to conditionally use UUID v7 based on environment variableKC_USER_UUID_V7Motivation
Primary Use Case: External Database Performance
This enhancement primarily targets deployments where Keycloak user IDs are used as primary keys in external databases. Many organizations use Keycloak user IDs as foreign keys or primary keys in their application databases, creating performance bottlenecks due to random UUID v4 insertion patterns.
Database Performance Issues with UUID v4
When Keycloak user IDs are used as primary keys in external databases, UUID v4 (random UUIDs) create several performance challenges:
Benefits of UUID v7
UUID v7 addresses these issues by incorporating a timestamp component while maintaining UUID compatibility:
Technical Details
UUID v7 Structure
0111(7)10Backward Compatibility Strategy
The implementation uses an opt-in approach to ensure zero impact on existing deployments:
KC_USER_UUID_V7=trueis explicitly setNote: Actual results depend on database engine, hardware, and workload patterns
Configuration
Environment Variable
Safety Considerations
Why Environment Variable is Safe
Migration Strategy for Existing Deployments
For organizations wanting to adopt UUID v7:
UUID Version Coexistence
The system gracefully handles mixed UUID versions:
Implementation Details
Files Modified
server-spi-private/src/main/java/org/keycloak/models/utils/KeycloakModelUtils.javaservices/src/main/java/org/keycloak/partialimport/UsersPartialImport.javaNew Methods Added
generateIdv7()- New UUID v7 generator methodModified Methods
UsersPartialImport.create()- Added conditional UUID v7 generation based onKC_USER_UUID_V7environment variableTesting
Unit Tests
testUUIDv7())