Skip to content

triton inference: use urllib for health check#19

Merged
YassineYousfi merged 7 commits intomasterfrom
health
Feb 16, 2026
Merged

triton inference: use urllib for health check#19
YassineYousfi merged 7 commits intomasterfrom
health

Conversation

@YassineYousfi
Copy link
Contributor

No description provided.

@YassineYousfi
Copy link
Contributor Author

This is to get around evhtp thread-pinning issue in InferenceServerClient
(1) InferenceServerClient health check uses keep-alive connections so stays in thread 0
(2) Model work starts on thread 0 and potentially blocks it for > 60 sec
(3) InferenceServerClient health check tries to reuse the connection on thread 0 and doesn't get a response VS
urllib (or creating a fresh client) would start a new one on thread 1 and get a response instantly

Root cause confirmed by running model loading and health checks in a tight loop, health checks using the following methods
(i) re-using the same health check client (current setup)
(ii) creating a short lived health check client for each request
(iii) urllib

with the default --http-thread-count value (8) only (i) fails, with --http-thread-count 1 all methods fail

@YassineYousfi YassineYousfi marked this pull request as ready for review February 16, 2026 21:11
@YassineYousfi YassineYousfi merged commit b9122d4 into master Feb 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant