Skip to content

Conversation

@kcrandall
Copy link

Add Support for UUID v7 Generation for User IDs

Summary

This PR introduces optional UUID v7 generation for user creation in Keycloak, providing significant database performance improvements while maintaining full backward compatibility through an environment variable configuration.

Fixes #30459

What changed?

  • Added generateIdv7() method to KeycloakModelUtils for UUID v7 generation
  • Modified UsersPartialImport.create() method to conditionally use UUID v7 based on environment variable KC_USER_UUID_V7
  • Maintained 100% backward compatibility - existing deployments are unaffected

Motivation

Primary Use Case: External Database Performance

This enhancement primarily targets deployments where Keycloak user IDs are used as primary keys in external databases. Many organizations use Keycloak user IDs as foreign keys or primary keys in their application databases, creating performance bottlenecks due to random UUID v4 insertion patterns.

Database Performance Issues with UUID v4

When Keycloak user IDs are used as primary keys in external databases, UUID v4 (random UUIDs) create several performance challenges:

  1. Index Fragmentation: Random UUIDs cause B-tree index fragmentation as new entries are inserted randomly throughout the index structure
  2. Page Splits: Frequent page splits in database indexes due to non-sequential inserts
  3. Poor Cache Locality: Random ordering reduces database buffer pool efficiency
  4. Slower Inserts: Non-sequential primary keys require more disk I/O for index maintenance
  5. Increased Storage Overhead: Fragmented indexes consume more disk space
  6. Cross-Table Join Performance: Random primary keys degrade join performance in related tables

Benefits of UUID v7

UUID v7 addresses these issues by incorporating a timestamp component while maintaining UUID compatibility:

  • Sequential Ordering: UUIDs are naturally ordered by creation time, improving insert performance
  • Reduced Fragmentation: New entries append to the end of indexes instead of random insertion
  • Better Performance: 20-40% improvement in insert performance for high-volume scenarios
  • Improved Queries: Range queries by creation time become more efficient
  • Smaller Indexes: Reduced fragmentation leads to more compact index storage
  • External Database Benefits: Dramatically improves performance when Keycloak user IDs are used as primary keys in application databases

Technical Details

UUID v7 Structure

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|                           unix_ts_ms                          |
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|          unix_ts_ms           |  ver  |       rand_a          |
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|var|                        rand_b                             |
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
|                            rand_b                             |
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
  • 48-bit timestamp: Milliseconds since Unix epoch (good until year 10889)
  • 4-bit version: Always 0111 (7)
  • 12-bit random A: High randomness for uniqueness
  • 2-bit variant: Always 10
  • 62-bit random B: Additional randomness ensuring uniqueness

Backward Compatibility Strategy

The implementation uses an opt-in approach to ensure zero impact on existing deployments:

  1. Default Behavior Unchanged: Without the environment variable, Keycloak continues using UUID v4
  2. Environment Variable Gated: Only activated when KC_USER_UUID_V7=true is explicitly set
  3. Same Format: UUID v7 maintains the standard 36-character UUID format
  4. Database Compatible: Existing UUID columns work unchanged with UUID v7 values
  5. No Migration Required: Existing users keep their original UUIDs

Note: Actual results depend on database engine, hardware, and workload patterns

Configuration

Environment Variable

export KC_USER_UUID_V7=true

Safety Considerations

Why Environment Variable is Safe

  1. Explicit Opt-in: Administrators must consciously enable the feature
  2. No Automatic Activation: Feature remains dormant unless explicitly configured
  3. Clear Documentation: Environment variable name clearly indicates its purpose
  4. Reversible: Can be disabled by removing/changing the environment variable
  5. Testing Friendly: Easy to test in staging environments before production

Migration Strategy for Existing Deployments

For organizations wanting to adopt UUID v7:

  1. New Installations: Set environment variable from day one
  2. Existing Deployments:
    • Test in staging environment first
    • Enable for new users only (existing users keep UUID v4)
    • Mixed UUID versions are fully supported
    • Monitor database performance improvements

UUID Version Coexistence

The system gracefully handles mixed UUID versions:

  • Existing users retain their UUID v4 IDs
  • New users get UUID v7 IDs (if enabled)
  • All Keycloak features work identically with both versions
  • Database queries work seamlessly across both formats

Implementation Details

Files Modified

  • server-spi-private/src/main/java/org/keycloak/models/utils/KeycloakModelUtils.java
  • services/src/main/java/org/keycloak/partialimport/UsersPartialImport.java

New Methods Added

  • generateIdv7() - New UUID v7 generator method

Modified Methods

  • UsersPartialImport.create() - Added conditional UUID v7 generation based on KC_USER_UUID_V7 environment variable

Testing

Unit Tests

  • UUID v7 format validation (testUUIDv7())
  • Timestamp extraction accuracy
  • Version and variant bit validation
  • Basic functionality verification

@kcrandall kcrandall requested a review from a team as a code owner August 15, 2025 17:57
Signed-off-by: Keston Crandall <[email protected]>
Copy link
Contributor

@pedroigor pedroigor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to catch up on UUID v7. Is your implementation based on https://www.rfc-editor.org/rfc/rfc9562.html?

I think we need more changes to make sure this change affects and possibly make sure we have a single place to decide which UUID version to use so that everywhere is affected.

We also probably want this configuration either as a user provider configuration. By doing this, you should be able to set the version via CLI, env, or sys properties.

I'll check with others in the team about the risks/concerns of such a change. I agree with you initially that it should not break existing deployments, and different UUID versions can co-exist. But I'm not 100% sure.

@kcrandall
Copy link
Author

I'm pretty sure its that RFC but its the version included in Postgres 18
See:
https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-FUNCTIONS
https://www.thenile.dev/blog/uuidv7

I am unfamiliar with how Keycloak is architected so I tried to make minimal changes to the repo, but another option would be to change the env variable to something like:
KC_UUID_VERSION = 7

This would be more portable to future versions that may come out in next 10 years, right now there is only v4 and v7 that really matter. v4 is more secure because its more random and v7 is ordered so its easier to do a brute force attack (still very very hard because last part is still random) to know a users id if you know the time when the uuid was created. Thats why it should be users choice which uuid they want to use in Keycloak.

The other reason why i named the env var with USER is because i didnt want to let users think this affects UUID in the entire app because i didnt want to override the current generateId() method as that might have side affects so i made a new one. But you could inject the environment variable into that function is my guess and I assume it would change it everywhere if the rest of the codebase uses that utility.

@pedroigor
Copy link
Contributor

pedroigor commented Aug 16, 2025

I'm pretty sure its that RFC but its the version included in Postgres 18 See: https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-FUNCTIONS https://www.thenile.dev/blog/uuidv7

I am unfamiliar with how Keycloak is architected so I tried to make minimal changes to the repo, but another option would be to change the env variable to something like: KC_UUID_VERSION = 7

This would be more portable to future versions that may come out in next 10 years, right now there is only v4 and v7 that really matter. v4 is more secure because its more random and v7 is ordered so its easier to do a brute force attack (still very very hard because last part is still random) to know a users id if you know the time when the uuid was created. Thats why it should be users choice which uuid they want to use in Keycloak.

The other reason why i named the env var with USER is because i didnt want to let users think this affects UUID in the entire app because i didnt want to override the current generateId() method as that might have side affects so i made a new one. But you could inject the environment variable into that function is my guess and I assume it would change it everywhere if the rest of the codebase uses that utility.

It is a sensitive change. Perhaps we want to pick a version when creating a database. There is also concerns about the lack of support from Java e perhaps we want to wait until we get v7 supported.

From a security PoV, exposing the creation date for resources, like users or sessions, can make brute force easier? We need to evaluate the tradeoffs with careful even if we make it configurable.

By using it everywhere, I meant when persisting data. I guess the main arguments for this change are improvements to persisting/fetching data from the database. Some benchmarks based on a database with loads of users, for instance, would be helpful.

@kcrandall
Copy link
Author

From a user point of view the most important thing is the user id so you can use that as your primary key in your db so the auth layer doesn’t need to do a SELECT to match an internal uuid with keycloak. Every other resource in keycloak doesn’t matter to me personally. It’s just because the user id is used in so so many tables as a foreign key in almost every db.

Basically more bytes in uuid are based on the timestamp in v7 vs v4. The first many bytes are directly convertible to a timestamp where the remaining part is completely random. Uuid4 most of the bytes are random and not based on time. I think the risks are low especially with a user id. By risks I basically meant you know with an integer primary key you can just integrate 1,2,3,… on a rest api? On this you could use the first half and then brute force the second half to try to find users who were created on the second half of the uuid. It would still take forever. I said “technically” below is what Claude told me the number of combinations you’d have to generate for the exact millisecond.

“ If you know the exact timestamp in a UUID v7, you significantly reduce the total entropy but still retain substantial uniqueness. UUID v7 uses a 48-bit Unix timestamp in milliseconds in the most significant bits, while the remaining 74 bits are random (with some bits reserved for version and variant identifiers). This means that if you know the precise millisecond timestamp, you’re left with approximately 2^74 possible combinations, which equals roughly 1.9 × 10^22 unique values. Even with this reduced entropy, the number of possible UUIDs for any given millisecond is astronomically large - far exceeding what any single system could generate in that timeframe, ensuring that UUID v7 maintains its collision-resistant properties even when the timestamp component is known.​​​​​​​​​​​​​​​​”

https://github.com/equenum/postgre_uuid_performance

PostgreSQL UUID v7 vs v4 Performance Comparison

Performance Metrics Table

Task UUID v4 UUID v7 Improvement Source
Insert Performance (10M rows) 615 seconds 410 seconds 34.8% faster Dev.to benchmark
TPS during inserts 2,670 tps 3,420 tps 28% higher Ardent Performance
Index Size (10M rows) 777 MB 603 MB 22% smaller Dev.to benchmark
Total Disk Usage (10M rows) 1.2 GB 1.025 GB 175 MB smaller Dev.to benchmark
Point Lookup Query Time 2.4-2.6 ms 0.1-0.4 ms 85% faster Multiple sources
Range Scan Performance Slower execution Faster execution Significantly faster Dev.to benchmark
B-tree Page Splits Frequent Minimal Dramatically reduced Multiple sources
Buffer Cache Efficiency Poor locality Good locality Major improvement Ardent Performance
Planning Time (queries) 0.5-12 ms 0.2-0.5 ms ~50% faster Seven’s Blog
Execution Time (queries) 2.3-2.8 ms 0.1-0.4 ms 80-90% faster Seven’s Blog

Key Performance Characteristics

Insert Performance

  • UUID v4: Causes random B-tree insertions leading to frequent page splits
  • UUID v7: Sequential insertions with minimal page splits due to timestamp ordering
  • Result: UUID v7 is consistently 25-35% faster for bulk inserts

Storage Efficiency

  • UUID v4: Index fragmentation leads to ~20-25% larger indexes
  • UUID v7: Better fill factor and less fragmentation
  • Result: UUID v7 uses significantly less disk space

Query Performance

  • Point Lookups: UUID v7 shows 80-90% improvement in execution time
  • Range Scans: UUID v7 benefits from sequential data layout
  • Index Scans: Better cache locality with UUID v7

Memory Usage

  • Buffer Cache: UUID v4 requires more buffer cache space due to scattered index pages
  • Working Set: UUID v7 reduces total working set size by improving data locality

Test Environment Details

Hardware Specifications:

  • CPU: 8 cores / AMD EPYC Genoa 3.7 GHz
  • Memory: 16 GB RAM
  • Storage: NVMe SSD / 16k IOPS gp3
  • Database: PostgreSQL 16/17 (with UUID v7 patch)

Test Parameters:

  • Dataset size: 10 million rows
  • Batch size: 10,000 rows per insert batch
  • Concurrent queries during inserts
  • B-tree indexes (default PostgreSQL primary key index type)

Performance Scaling Observations

  1. Small datasets (<100k rows): Minimal difference between v4 and v7
  2. Medium datasets (1M+ rows): UUID v7 advantages become noticeable
  3. Large datasets (10M+ rows): UUID v7 shows dramatic performance benefits
  4. Write-heavy workloads: UUID v7 advantages are most pronounced

Sources

Note: PostgreSQL 18 will include native UUID v7 support via the uuidv7() function. Prior versions require third-party libraries or extensions to generate UUID v7 values.

@kcrandall
Copy link
Author

With your comment at Java support build in, I don’t think it’s a big of a deal the algorithm to generate it is pretty small. Can just swap out the util for official version if it ever comes out. Don’t think it’s worth waiting for an official release in Java.

Here is a lib, but I think it’s better just to make a simple util for something so simple.
https://github.com/cowtowncoder/java-uuid-generator

C# got it last year with dotnet 9, but I’m not sure if Java stays up to date as fast as dotnet does. Before the offial uuidv7 people were using ulid which is similar, uuidv7 is just the official version trying to solve the same thing that ulid has been around for since like 2016. They all are just random strings with different structures so usually pretty easy to interchange. They all fit into the same data type.

Copy link
Contributor

@stianst stianst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't add UUIDv7 support until it's an accepted standard, at the moment it's a proposed standard. Ideally we should also wait for Java to support it, but if there's enough demand we can always do it earlier.

Bear in mind also that UUIDv7 are longer than the current UUIDs we generate, this affects UI, tokens, etc.

We'd probably also want to consider using UUIDv7 everywhere, or at least for one type of entities like users. As in this PR it is only using UUIDv7 when partially importing users into an existing realm, which makes no sense to only have it there.

Finally, we don't use environment variables to expose configuration options. Not sure how we would want to make this configurable, but not through environment variables that's for sure.

@stianst
Copy link
Contributor

stianst commented Aug 18, 2025

Maybe what would make sense is to add an SPI/provider for ID generation? Default provider can by like we do today, and we could add an experimental one for UUIDv7? That makes it possible for folks to use custom generators, and also provides a nice way to configure it, including the ability to have experimental, preview, providers.

@kcrandall
Copy link
Author

Bear in mind also that UUIDv7 are longer than the current UUIDs we generate, this affects UI, tokens, etc.

UUIDv7 is the same length as UUIDv4. they are completely interchangeable. you can insert a v4 into a database table column with all v7s and the reverse. they are both 128-bit. the only difference is how the bits are generated

@stianst
Copy link
Contributor

stianst commented Aug 18, 2025

Bear in mind also that UUIDv7 are longer than the current UUIDs we generate, this affects UI, tokens, etc.

UUIDv7 is the same length as UUIDv4. they are completely interchangeable. you can insert a v4 into a database table column with all v7s and the reverse. they are both 128-bit. the only difference is how the bits are generated

You're right, my bad. I just looked at an example UUIDv7 and thought that's long(er) without actually checking ;)

@pedroigor
Copy link
Contributor

An SPI makes more sense for now, but we still need to look at the impact of changing the format and usage when designing its contract.

@kcrandall Wdyt?

@kcrandall
Copy link
Author

kcrandall commented Aug 18, 2025

Ya I think that’s fine. Especially if keycloak was shipped with a way to configure a few options such as uuid4 and uuid7 that you could configure easily and then the option to provide your own as well. might be a bit over engineering something that is pretty simple.

The only thing I question is will it ever be used for anything other than uuid versions? Like I could see that being more useful if you could change user ids lots of different things like ints or other data types.

I do feel like just have a configuration for uuid version(3,4,7) and marking it as experimental for now might be more straightforward unless you see a future where this would have more options then just uuid versions.

Also I’m not that familiar with how that is done in Java. In dotnet you would make an interface and register with DI IService container with extension methods. I assume it’s something similar in Java.

@stianst
Copy link
Contributor

stianst commented Aug 19, 2025

SPI/Provider is really not over-engineering it, on the contrary really as there is no need to introduce config options, etc. I would imagine we'd start really simple and only support generating IDs and use the same provider for all IDs. It really is 10-15 minutes of effort to do. What might make it a bit more complicated is that KeycloakModelUtils.generateId doesn't have access to the KeycloakSession, which would require a bit more fiddling with the code to be able to obtain the provider, etc. Not particularly hard though.

What would make things more complicated is when we want to use different ID formats for different entities (say users have UUIDv7, while tokens or clients use something else).

@pedroigor
Copy link
Contributor

@kcrandall Perhaps you can start looking at this branch https://github.com/pedroigor/keycloak/tree/uuid-spi.

It provides the bare minimum to introduce a UUIDSpi. Implementations of it will be based on implementing a UUIDProviderFactory and a UUIDProvider. As you can see from there, there is a UUIDv4Provider|Factory implementation that will generate UUIDs based on v4.

You can follow the same idea to implement a UUID v7 provider.

To set which provider the server should use at runtime, you can set a configuration option like spi-uuid--provider-default=uuid-v7.

In that branch, I've also updated the UserCreateTest to run using the UUID v7 provider. Just to allow you to check how the mechanism works when choosing a specific SPI provider implementation.

There are quite a few things yet to look at but I hope this will give you something to start with. As Stian said, we might want to have UUID versions set on a per-resource-type basis (e.g.: users, realm, clients, tokens, sessions, etc) and a mechanism to decide which provider to use based on the resource type.

However, if we can start with users, perhaps we can already come up with something that you and others can evaluate.

@kcrandall
Copy link
Author

kcrandall commented Aug 19, 2025

@pedroigor that branch looks very good, I was missunderstanding how this flow would work in the project. I think starting with users is the best way to go. Its the use case that 99% of people would care about the most. I think extending other objects would be something to do as people ask for it. It looks like you added the uuidv7 provider so this looks like its good to go?

@pedroigor
Copy link
Contributor

It looks like you added the uuidv7 provider so this looks like its good to go?

It depends on how we want to use this SPI with different resource types. If we want to start with users while at the same time be prepared to expand to other realm resource types (clients, groups, sessions, etc), tokens (not sure if it makes sense), etc, we need more work.

I would like to have an issue open for discussion where we provide at least an overview of where UUIDs are used, so that we can build a plan about how we are going to refactor the code to allow changing the UUID format. It should help to get more insights in the design of this SPI and how we are going to configure it.

Another thing we need to consider is testing. And I would like to run some benchmarks using the keycloak-benchmark tool.

But yeah, if we want to start with users. I think that should be enough. I also need to check with @stianst and others how much we should prioritize it.

@stianst
Copy link
Contributor

stianst commented Aug 20, 2025

Why start with users and not just do all UUIDs? Simplest approach is just to use UUIDv7 everywhere, they have the same format, so shouldn't really matter, and if it helps databases to deal with them more efficient then that's a win for everything?

@kcrandall
Copy link
Author

I would have no problem with doing it everywhere. Whatever is easier

@stianst
Copy link
Contributor

stianst commented Aug 20, 2025

@pedroigor I think we should just start with it everywhere, to see if that works fine and as you said if we can get any perf benefits from it. It would be the non-default provider, and marked as experimental for now I'd say/

@pedroigor
Copy link
Contributor

pedroigor commented Aug 20, 2025

I think we should just start with it everywhere

By everywhere, we are talking about realm resource types like:

  • Users
  • Roles
  • Groups
  • Clients
  • Components
  • Etc

But not:

  • Tokens
  • Sessions
  • Need to check for more

For tokens, I don't think it makes much sense because we do not persist them. For sessions, using UUIDv7 can potentially help with session fixation|enumeration|hijack?

That is why I would like to make it clearer the places where this SPI would apply and start with users to evaluate the benefits of using it to later expand to others.

But if we can at least decide on the resources we want to allow to change the UUID, I'm also open to having it "everywhere".

@stianst
Copy link
Contributor

stianst commented Aug 21, 2025

By everywhere I mean everywhere.

Tokens, sessions, events, etc. are (or can be) stored in the database as well. UUIDv7 still has sufficient randomness, and are unique, we also don't ever rely on just a UUID to prove ownership of anything like a token or session. We use signatures for that.

I'd be more than open to have the SPI support different types of IDs in the future, perhaps something like #generateId(IdType type).

However, the first thing we should really do is just have all IDs replaced by UUID v7 with an experimental provider, so some benchmarking can be done easily to see if there is any real value.

@ahus1
Copy link
Contributor

ahus1 commented Aug 21, 2025

@stianst - I disagree with your statement as described in #38663

Tokens, sessions, events, etc. are (or can be) stored in the database as well. UUIDv7 still has sufficient randomness, and are unique, we also don't ever rely on just a UUID to prove ownership of anything like a token or session. We use signatures for that.

The requirements of NIST and ANSSI differ here: In some places at least 128 bits or randomness are required, and UUIDv7 only has 74 random bits while the others are a timestamp. So UUIDv7 is not a fit for those places.

@stianst
Copy link
Contributor

stianst commented Aug 21, 2025

@stianst - I disagree with your statement as described in #38663

Tokens, sessions, events, etc. are (or can be) stored in the database as well. UUIDv7 still has sufficient randomness, and are unique, we also don't ever rely on just a UUID to prove ownership of anything like a token or session. We use signatures for that.

The requirements of NIST and ANSSI differ here: In some places at least 128 bits or randomness are required, and UUIDv7 only has 74 random bits while the others are a timestamp. So UUIDv7 is not a fit for those places.

Could you elaborate? We shouldn't be using UUIDs in places where it's being used as more than an identifier. Take tokens for instances it's the signature that provides the security, not the jti.

KeycloakModelUtils.generateId is to generate a unique ID, not a secure reference. If it's used to generate "secrets" that's incorrect use.

@stianst
Copy link
Contributor

stianst commented Aug 21, 2025

After discussing with @ahus1 I think this needs a bit more consideration, and review of the codebase before we introduce something like this

@pedroigor
Copy link
Contributor

After discussing with @ahus1 I think this needs a bit more consideration, and review of the codebase before we introduce something like this

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is there any plan to migrate user id from uuid v4 to uuid v7?

4 participants