Skip to content

Threaded resolver shutdown can hang and/or leak memory #18532

@jblazquez

Description

@jblazquez

I did this

Upgraded our application from curl 8.15.0 to 8.16.0 with no other changes.

While I don't have all of the details at this time, here are two issues that we immediately noticed which seem to have been introduced with #18263.

  1. Intermittent getaddrinfo memory leak on shutdown:

Running our application with LeakSanitizer enabled, performing a single HTTP request with libcurl, and quickly exiting the application (potentially before the request completes) results in a LeakSanitizer report about 10% of the time due to a memory leak where freeaddrinfo is apparently never called:

==275324==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 64 byte(s) in 1 object(s) allocated from:
    #0 0x5a3b33c486db in malloc asan_malloc_linux.cpp:67:3
    #1 0x73e4d375b639 in generate_addrinfo nss/getaddrinfo.c:1081:16
    #2 0x73e4d375b639 in gaih_inet nss/getaddrinfo.c:1199:12
    #3 0x73e4d375b639 in getaddrinfo nss/getaddrinfo.c:2391:12
    #4 0x5a3b33be862c in getaddrinfo sanitizer_common_interceptors.inc:2893:13
    #5 0x5a3b3604a1d2 in Curl_getaddrinfo_ex lib/curl_addrinfo.c:122:11
    #6 0x5a3b3610bb9f in getaddrinfo_thread lib/asyn-thrdd.c:252:10
    #7 0x5a3b3604bd23 in curl_thread_create_thunk ib/curl_threads.c:57:3
    #8 0x5a3b33c4638d in asan_thread_start(void*) asan_interceptors.cpp:239:28
    #9 0x73e4d369caa3 in start_thread nptl/pthread_create.c:447:8

Indirect leak of 128 byte(s) in 2 object(s) allocated from:
    #0 0x5a3b33c486db in malloc lib/asan/asan_malloc_linux.cpp:67:3
    #1 0x73e4d375b639 in generate_addrinfo nss/getaddrinfo.c:1081:16
    #2 0x73e4d375b639 in gaih_inet nss/getaddrinfo.c:1199:12
    #3 0x73e4d375b639 in getaddrinfo nss/getaddrinfo.c:2391:12
    #4 0x5a3b33be862c in getaddrinfo sanitizer_common_interceptors.inc:2893:13
    #5 0x5a3b3604a1d2 in Curl_getaddrinfo_ex lib/curl_addrinfo.c:122:11
    #6 0x5a3b3610bb9f in getaddrinfo_thread lib/asyn-thrdd.c:252:10
    #7 0x5a3b3604bd23 in curl_thread_create_thunk lib/curl_threads.c:57:3
    #8 0x5a3b33c4638d in asan_thread_start(void*) lib/asan/asan_interceptors.cpp:239:28
    #9 0x73e4d369caa3 in start_thread nptl/pthread_create.c:447:8

SUMMARY: AddressSanitizer: 192 byte(s) leaked in 3 allocation(s).
  1. Intermittent hang in async_thrdd_shutdown

Similarly, running our application many times in a loop results in a hang about 5% of the time. The hang occurs in async_thrdd_shutdown with the following callstack:

#0  futex_wait (private=0, expected=2, futex_word=0x74a0c3ae7a30) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x74a0c3ae7a30, private=0) at ./nptl/lowlevellock.c:49
#2  in lll_mutex_lock_optimized (mutex=0x74a0c3ae7a30) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=mutex@entry=0x74a0c3ae7a30) at ./nptl/pthread_mutex_lock.c:93
#4  async_thrdd_shutdown (data=data@entry=0x75d0c3af2100) at lib/asyn-thrdd.c:533
#5  Curl_async_thrdd_shutdown (data=0x75d0c3af2100) at lib/asyn-thrdd.c:585
#6  Curl_async_shutdown (data=0x75d0c3af2100) at lib/asyn-base.c:200
#7  multi_done (data=0x75d0c3af2100, status=<optimized out>, premature=false) at lib/multi.c:654
#8  multi_runsingle (multi=<optimized out>, nowp=<optimized out>, data=0x75d0c3af2100) at lib/multi.c:2596
#9  curl_multi_perform (m=<optimized out>, running_handles=<optimized out>) at lib/multi.c:2771

The hanging line is trying to acquire the mutex here.

Looking at the contents of the addr_ctx->mutx variable, they look like this:

$2 = {__data = {__lock = 2, __count = 0, __owner = 263166, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\376\003\004\000\001", '\000' <repeats 26 times>, __align = 2}

In particular, __lock = 2 which I believe means the mutex is currently locked, and __owner is TID 263166 which - at the time of the hang - does not refer to any running thread. I assume that TID refers to the no-longer-running async resolver thread.

What I think might have happened is that pthread_cancel was called on the resolver thread while the thread had the mutex locked and before it had a chance to unlock it, which of course is catastrophic. This might also explain the intermittent getaddrinfo memory leak.

I expected the following

The async resolver does not leak or hang.

curl/libcurl version

curl 8.16.0

operating system

Ubuntu Linux 24.04

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions