Skip to content

Conversation

@ahus1
Copy link
Contributor

@ahus1 ahus1 commented Sep 8, 2025

Closes #40374

Follow-ups:

  • add more validations for an existing user session
  • apply validation of existing user session also in other places (Embedded Infinispan, Standard Infinispan, Remote Infinispan)
  • follow-up issue to validate generated IDs of an authentication session

@ahus1 ahus1 self-assigned this Sep 8, 2025
@ahus1
Copy link
Contributor Author

ahus1 commented Sep 8, 2025

Hello @mposolda and @pruivo - I had a look how EmbeddedCachesChangesPerformer implements it by applying all updates to the entity retrieved from the cache.

case ADD_IF_ABSENT:
SessionEntityWrapper<V> existing = cache.putIfAbsent(key, sessionWrapper, task.getLifespanMs(), TimeUnit.MILLISECONDS, task.getMaxIdleTimeMs(), TimeUnit.MILLISECONDS);
if (existing != null) {
LOG.debugf("Existing entity in cache for key: %s . Will update it", key);
// Apply updates on the existing entity and replace it
task.runUpdate(existing.getEntity());
replace(key, task, existing, task.getLifespanMs(), task.getMaxIdleTimeMs());
} else {
LOG.tracef("Add_if_absent successfully called for entity '%s' to the cache '%s' . Lifespan: %d ms, MaxIdle: %d ms", key, cache.getName(), task.getLifespanMs(), task.getMaxIdleTimeMs());
}
break;

So I reworked JpaChangesPerformer to first read from the database before then continuing to either create the entity updating it.

This raises the cost of creating a new session by one additional SELECT as part of an already existing transaction. While this is a round-trip to the database which is costly, the impact on the database should be smaller than the already existing INSERT. This could be in a later step be optimized in a more opportunistic way to try a plain insert without a select on the first attempt, and a select on any subsequent attempt.

If that transaction will fail, it will then retry as part of the existing retry mechanism we have in place in either PersistentSessionsWorker (batching enabled) or PersistentSessionsChangelogBasedTransaction (batching disabled).

Please let me know your thoughts. Can you think of any additional checks that I should add if a session is found in addition to the session being new and is matching the current user?

@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch from 3fdcd8e to be787c2 Compare September 8, 2025 19:58
keycloak-github-bot[bot]

This comment was marked as outdated.

@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch 2 times, most recently from 6c6e3b8 to 8cd5c2b Compare September 9, 2025 09:11
keycloak-github-bot[bot]

This comment was marked as outdated.

@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch 2 times, most recently from 77a9546 to 36ab7b1 Compare September 9, 2025 10:13
Comment on lines 192 to 195
do {
id = keyGenerator.generateKeyString(session, sessionTx.getCache());
// prevent potential duplicate ID generation
} while (sessionTx.get(id) != null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh!? What's the probability of a conflict? 1 in 2^122? It seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll leave that for a separate PR

Comment on lines 533 to 416
} else if (merged.getOperation() == SessionUpdateTask.CacheOperation.ADD_IF_ABSENT) {
// This might happen if the user logs in via multiple tabs at the same time from an external broker, and the same authentication session creates
// multiple user sessions concurrently.
if (!Objects.equals(userSessionModel.getUserId(), entity.getUser())) {
throw new ModelIllegalStateException("User ID of the session does not match, the user ID should not change");
}
if (Math.abs(userSessionModel.getStarted() - entity.getStarted()) > 10) {
throw new ModelIllegalStateException("Session has already aged, concurrent requests should not happen");
}
// If the above matches, it is safe to continues to apply all updates to the entity as before in EmbeddedCachesTransformer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is enormous and difficult to read, but I think this is dead code. You're invoking createUserSession before doing loadUserSession, so it should return the entity you just created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the createUserSession was left-over from a manual test; I should have never added it.

This is now cleaned up, and there is a new test testConcurrentSessionCreation which tests the functionality even without concurrency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand these checks, mainly because we don't care for those with volatile sessions; we just merge the changed fields.
My other concern is that it will fail all 20 retries we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is defensive programming, as bad things would happen if we connect a new user to an existing session that is bound to a different user. A user ID is assumed to be constant.

As an example: In #38671 where a user deleted their account, we forgot to start a new authentication session, and that lead to the next log in in that browser tab to use the same user session ID. The database showed a duplicate key error. If we would just continue with the old session, we would continue with a wrong user in that session.

Yes, I agree those 20 retries will fail, and I'd rather see the error and follow up on the cause, than having two user sessions mixed up (which is IMHO the very worst case that could happen for an IAM).

Checking the user ID is IMHO the minimal check. I'd rather add more fields to the check, but probably not as a backport and only if Marek agrees. We could have a follow-up issue for the other stores as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it very confusing to have the same session ID for different users. It looks like a problem in the authentication flow or in the cookies.

Other suggestions:

  • I would like a comment about the started timestamp and why the value 10.
  • Do not retry when ModelIllegalStateException is thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pruivo - Thank you for the feedback. The latest commit addresses your review comments. There are now code comments explaining the why, and the retry is skipped on a ModelIllegalStateException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the I needed to revert the change and keep the retries. In run https://github.com/keycloak/keycloak/actions/runs/17609154462/job/50027163556?pr=42451 the test UserSessionConcurrencyTest.testConcurrentNotesChange showed that there can be an optimistic lock exception that is then later converted to an ModelIllegalStateException here:

throw new ModelIllegalStateException("Database operation failed", t);

I'd say this retry is rare and won't have a performance implications as those new exceptions would only occur in error cases.

} else {
PersistentUserSessionAdapter userSessionModel = (PersistentUserSessionAdapter) userSessionPersister.loadUserSession(realm, entry.getKey().toString(), entity.isOffline());
if (userSessionModel != null) {
if (merged.getOperation() == SessionUpdateTask.CacheOperation.ADD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use CacheOperation.ADD anywhere in the code. Can we eliminate it to simplify things a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "just" a safeguard, and I would like to keep it. The operation is used with other caches, and unfortunately this enum isn't specific to this cache.

As I don't know how to simplify it, I would keep it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch may be simpler to read

        switch (merged.getOperation()) {
            case REMOVE:
                remove();
                break;
            case ADD, ADD_IF_ABSENT:
                entity = load();
                if (entity == null) {
                    create();
                } else {
                    merge();
                }
                break;
            case REPLACE:
                entity = load();
                if (entity != null) {
                    merge();
                }
                break;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in a refactoring. Please have a look.

userSessionQuery.setMaxResults(1);

Stream<OfflineUserSessionModel> persistentUserSessions = closing(userSessionQuery.getResultStream().map(this::toAdapter));
Stream<OfflineUserSessionModel> persistentUserSessions = closing(userSessionQuery.getResultStream().map(this::toAdapter).filter(Objects::nonNull));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use userSessionQuery.getSingleResultOrNull()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten to be imperative and not using streams.

sessionManager = new UserSessionManager(currentSession);
KeycloakModelUtils.runJobInTransaction(session.getKeycloakSessionFactory(), session.getContext(), currentSession -> {
RealmModel realm = currentSession.realms().getRealmByName("test");
UserSessionManager sessionManager = new UserSessionManager(currentSession);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

currentSession = sessionCR1;
sessionManager = new UserSessionManager(currentSession);
String realmId = KeycloakModelUtils.runJobInTransactionWithResult(session.getKeycloakSessionFactory(), currentSession -> {
UserSessionManager sessionManager = new UserSessionManager(currentSession);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

currentSession.users().addUser(realm, "user2").setEmail("user2@localhost");
}
sessionManager = new UserSessionManager(currentSession);
UserSessionManager sessionManager = new UserSessionManager(currentSession);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsued

@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch 3 times, most recently from 1bf2369 to 3327ebf Compare September 9, 2025 12:31
@ahus1 ahus1 requested a review from pruivo September 9, 2025 12:35
@ahus1
Copy link
Contributor Author

ahus1 commented Sep 9, 2025

@pruivo - I've reworked the code, please have another look. The unused snippets in the test class have been cleaned up together with other warnings.

@keycloak keycloak deleted a comment from keycloak-github-bot bot Sep 9, 2025
@keycloak keycloak deleted a comment from keycloak-github-bot bot Sep 9, 2025
@ahus1 ahus1 removed the flaky-test label Sep 9, 2025
Copy link

@keycloak-github-bot keycloak-github-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@keycloak-github-bot
Copy link

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.broker.KcOidcBrokerTest#testPostBrokerLoginFlowWithOTP

Keycloak CI - Java Distribution IT (windows-latest - temurin - 17)

org.openqa.selenium.TimeoutException: 
java.net.SocketTimeoutException: Read timed out
Build info: version: '4.28.1', revision: '73f5ad48a2'
System info: os.name: 'Windows Server 2025', os.arch: 'amd64', os.version: '10.0', java.version: '17.0.16'
Driver info: driver.version: HtmlUnitDriver
...

Report flaky test

@mposolda
Copy link
Contributor

mposolda commented Sep 10, 2025

@ahus1 @pruivo Thanks a lot for the updates! I will be fine with whatever optimizations at the store level you guys agree with. I will try to do some tests later today or tomorrow to confirm if the fix actually works for me to fix the issue . I will probably use the steps from this comment, which I've used before to simulate the issue.

@mposolda mposolda self-requested a review September 10, 2025 07:55
@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch from d90f91f to 2f1f64c Compare September 10, 2025 11:19
@ahus1 ahus1 force-pushed the is-40374-unique-constraint branch from 2f1f64c to 4e7d87f Compare September 11, 2025 09:35
@ahus1
Copy link
Contributor Author

ahus1 commented Sep 11, 2025

PR rebased to resolve conflicts. Please let me know if you are ok to merge this one @mposolda

Copy link
Contributor

@mposolda mposolda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahus1 Thanks! I've did some testing and not able to simulate anymore the scenario from #40374 (comment) . So from the functional perspective, it looks good to me and I am approving.

I did not review the details of the storage changes and did not review performance. Might be good to have review/approve from @pruivo as well, but leaving it to you.

@ahus1
Copy link
Contributor Author

ahus1 commented Sep 11, 2025

@pruivo - can you please re-approve?

@ahus1 ahus1 merged commit 6a20214 into keycloak:main Sep 11, 2025
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random but frequent duplicate key value violates unique constraint \"constraint_offl_us_ses_pk2\" errors

3 participants