Skip to content

Conversation

@ahus1
Copy link
Contributor

@ahus1 ahus1 commented Dec 18, 2025

Closes #45015
Closes #43443

@ahus1 ahus1 self-assigned this Dec 18, 2025
@ahus1 ahus1 force-pushed the is-45015-sync-logging-from-nonblocking-threads branch from 3ba5173 to 18d203e Compare December 18, 2025 21:29
@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

@pruivo - please have a look when you have the time. Thanks!

@shawkins
Copy link
Contributor

@ahus1 the virtual thread issue was one of the reasons we added general support for async logging. Would you consider generally defaulting it to enabled when virtual threads are in use?

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

Would you consider generally defaulting it to enabled when virtual threads are in use?

Maybe, maybe not. Async logging might leak into some other parts, still we could try to default to it once we are confident it works ok.

One thing that I'm concerned with is that the queue length is limited to 512 entries. Once that is reached, it will block, so I'd rather keep this suggested fix as it will not have a max queue length.

I hope we can soon move to JDK 25 where this will no longer be a problem, as it doesn't have pinned threads. So once we use JDK 25, we wouldn't need to make async logging the new default.

So TL;DR I'd be leaning towards not making async logging the default, at least not to fix the issue at hand. There might be other good reasons to still enable it.

@shawkins
Copy link
Contributor

One thing that I'm concerned with is that the queue length is limited to 512 entries. Once that is reached, it will block, so I'd rather keep this suggested fix as it will not have a max queue length.

If you don't agree with the general default, changing that seems like a better approach and than adding special purpose async logging.

cc @mabartos

@shawkins
Copy link
Contributor

One thing that I'm concerned with is that the queue length is limited to 512 entries. Once that is reached, it will block

Keep in mind that if it does block, that won't typically lead to starvation. The logging is being done on a dedicated thread, so as long as the queue can be worked down at all, then processing will resume.

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

If you don't agree with the general default, changing that seems like a better approach and than adding special purpose async logging.

Well, for asynchronous logging it is generally a good idea have a fixed buffer, as otherwise you can run out of memory when you log faster than you can write to your log sinks. Still with our delicate pinned threads, we never want to get pinned, and therefore an unbounded queue for those entries might be the smaller of two evils here.

Keep in mind that if it does block, that won't typically lead to starvation.

Let's hope for that, though I'm not sure about that yet.

Maybe we can make async logging the default for an upcoming minor release. I assume async logging can then still be disabled?

IMHO we still need a fix that can be backported to 26.2/26.4, as those are affected by this as well, and I don't think we would switch those to async logging by default in a patch release.

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

@shawkins - it seems that CRDTest.testUpdateSpecValidation:110 is failing, which I think is unrelated. Any ideas?

@shawkins
Copy link
Contributor

shawkins commented Dec 18, 2025

Let's hope for that, though I'm not sure about that yet.

The issue with these deadlocks / starvations is that a virtual thread is responsible for doing the logging work. If all virtual threads become pinned, the logging that needs to happen clear the lock isn't guarenteed.

With a dedicated thread logging work can always proceed.

Maybe we can make async logging the default for an upcoming minor release. I assume async logging can then still be disabled?

Yes, an explicit configuration can override the default.

IMHO we still need a fix that can be backported to 26.2/26.4, as those are affected by this as well, and I don't think we would switch those to async logging by default in a patch release.

No, but you can certainly direct users to enable log-async in these situations.

@shawkins
Copy link
Contributor

@shawkins - it seems that CRDTest.testUpdateSpecValidation:110 is failing, which I think is unrelated. Any ideas?

It is unrelated. It looks like we're somehow running a different api server version in CI, so the error message is slightly different. I put a change as a second commit in this pr #45010 - but it can put into a separate pr if needed.

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

No, but you can certainly direct users to enable log-async in these situations.

I previously assumed it would happen only when DEBUG is enabled, but now we're seeing happening with INFO level logging, which concerns me. It IMHO means it could happen anytime, and "directing users to enable log-async in these situations" wouldn't be enough.

We could choose to not use the code in this PR if async logging is enabled, but enable it when async logging is not enabled, virtual threads are enabled and JDK 25 is not used. 🙄

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 18, 2025

I put a change as a second commit in this pr #45010 - but it can put into a separate pr if needed.

The failing CI might block other PRs as well. As long as #45010 is merged rather soon, I don't mind either way.

}
}

if (Thread.currentThread().getName().startsWith("non-blocking-thread")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most virtual threads do not have names, and JGroups vthreads do not start with non-blocking-thread.
You probably want to use org.infinispan.commons.jdkspecific.ThreadCreator#isVirtual(Thread)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I was hoping someone would have implemented something like that to make the detection available independent of the JDK version! Applied in next commit - ready for one more review.

@shawkins
Copy link
Contributor

I put a change as a second commit in this pr #45010 - but it can put into a separate pr if needed.

The failing CI might block other PRs as well. As long as #45010 is merged rather soon, I don't mind either way.

Opened #45018

Signed-off-by: Alexander Schwartz <[email protected]>
@pruivo
Copy link
Member

pruivo commented Dec 19, 2025

@ahus1, after sleeping over the subject, I have to give a 👎 to the PR. It makes more sense to disable vthreads if the core count is less than 8 or 16. With such a low core count, Keycloak does not generate enough load to overload the JGroups thread pool. We have been recommending that users disable it, and we haven't received any reports about JGroups thread pools or queues.

@ahus1
Copy link
Contributor Author

ahus1 commented Dec 19, 2025

I would still like to give it a chance. Let's discuss next year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock in Infinispan virtual threads Keycloak should warn when ISPN or JGROUPS is running in debug level logging

3 participants