Skip to content

Conversation

ojwb
Copy link
Member

@ojwb ojwb commented Oct 14, 2022

Arrange that destructors of local C++ objects in the wrapper function get run before we panic.

WIP: Causes some testcases to fail.

Fixes #1981

Arrange that destructors of local C++ objects in the wrapper function
get run before we panic.

WIP: Causes some testcases to fail.

Fixes swig#1981
@ojwb ojwb added the Go label Oct 14, 2022
@ojwb
Copy link
Member Author

ojwb commented Oct 14, 2022

The patch here changes the wrapper function structure to look like this:

Foo *_wrap_trigger_internal_swig_exception__SWIG_0_exception_memory_leak_bd1df5bf80fe08f6(_gostring_ _swig_go_0, Foo *_swig_go_1) {
  const char * SWIGUNUSED _swig_gopanic_message = NULL;
  Foo *_swig_go_result;
  
  {
    std::string *arg1 = 0 ;
    Foo *arg2 = (Foo *) 0 ;
    Foo *result = 0 ;
    
    
    std::string arg1_str(_swig_go_0.p, _swig_go_0.n);
    arg1 = &arg1_str;
    
    {
      arg2 = new Foo;
    }
    
    result = (Foo *)trigger_internal_swig_exception((std::string const &)*arg1,arg2);
    {
      if (result == NULL) {
        SWIG_exception(SWIG_RuntimeError, "Let's see how the bindings manage this exception!");
      }
      result = NULL;
    }
    {
      Foo::inc_freearg_count();
      delete arg2;
    }
    return _swig_go_result;
    
    fail: SWIGUNUSED;
    {
      Foo::inc_freearg_count();
      delete arg2;
    }
  }
  _swig_gopanic_impl(_swig_gopanic_message);
  return _swig_go_result;
}

_swig_gopanic is now a macro:

#define _swig_gopanic(M) do { _swig_gopanic_message = (M); goto fail; } while (0)

This means that the cleanup code (Foo::inc_freearg_count(); delete arg2;) is now run when _swig_gopanic() (and things which call it such as SWIG_exception()) are used. Also C++ object destructors for any function-local objects now get called (arg1_str in the example above).

I haven't investigated all the failures, but one awkward one is when the const char* passed to _swig_gopanic() is no longer valid after the goto. We could copy the string - that seems clumsy, though presumably this code path is only used in exceptional cases.

@ojwb
Copy link
Member Author

ojwb commented Oct 14, 2022

Oh but if we copy it then we'd need _swig_gopanic_impl(M) to know it needs to free(M); and using a std::string only works for C++ and the dtor won't get run anyway.

A fixed sized char array would work but doesn't seem a good approach.

@henrhoi
Copy link

henrhoi commented Nov 14, 2022

Will this issue of not deallocating the memory on panics be resolved? We are experiencing our pods being killed due to OOM.

@ojwb
Copy link
Member Author

ojwb commented Nov 14, 2022

@henrhoi Sadly I think this has reached the limits of my SWIG/Go knowledge. I used essentially the same approach to successfully fix the same issue for Lua, but differences between Go and Lua mean this doesn't trivially translate over as I'd hoped.

Maybe @ianlancetaylor can see a way to resolve this?

@ianlancetaylor
Copy link
Member

The current implementation of _swig_gopanic trims the message to 1024 bytes anyhow, so a fixed-size buffer is not out of the question.

I believe that all the calls of _swig_gopanic that are generated by SWIG use a string constant, so the fixed size buffer is only required for user code that calls _swig_gopanic itself.

@ojwb
Copy link
Member Author

ojwb commented Nov 15, 2022

@ianlancetaylor I can see several places in SWIG/Go which don't just pass a string literal to _swig_gopanic() - it looks like they're all to do with exception handling.

One example (which illustrates the general problem) is:

%typemap(throws) std::exception             %{_swig_gopanic($1.what());%}

Here $1 is the caught exception object, and the value returned by $1.what() is no longer valid after goto fail; with the wrapper code generated using the new structure shown above.

If the value passed is expected to often be a string constant we could potentially optimise handling to avoid the copy and temporary buffer in that case using __builtin_constant_p() or similar. The runtime overhead in this case is probably not important, but every wrapper function using 1KB extra stack space might be is a concern.

@ianlancetaylor
Copy link
Member

Sorry for being unclear. I was only looking at the _swig_gopanic calls that are generated directly by SWIG itself, not at the library calls.

It seems to me that few functions will call _swig_gopanic, and we only have to add the buffer to those that do. But maybe I'm wrong about that--maybe it's called more often than I expect.

Another approach we could take is to have _swig_gopanic call a Go function (like the cgo_panic wrapper we create in the top method). That Go function would convert the byte pointer to a Go string (as cgo_panic does today) and call runtime/cgo.NewHandle to get a handle (that function is available since Go 1.17). The function would return the handle to SWIG. SWIG could then run destructors as in this patch and pass the handle to _swig_gopanic, which would call the Value and Delete methods and then call panic.

@henrhoi
Copy link

henrhoi commented Dec 23, 2022

Thanks, @ianlancetaylor, and @ojwb! Contributing to this PR is beyond my SWIG/Go capabilities. Do you think anyone in the maintainer community would be able to solve this?

We are experiencing the behavior shown below, where our pods are reaching Gbs of memory before going OOM when experiencing a high load of requests that causes errors, even though the service should only need to use a few Mbs.

image

@ojwb
Copy link
Member Author

ojwb commented Jan 9, 2023

It seems to me that few functions will call _swig_gopanic, and we only have to add the buffer to those that do. But maybe I'm wrong about that--maybe it's called more often than I expect.

That's probably true for a typical C API, but if you're wrapping a C++ API which throws exceptions you probably need to catch C++ exceptions in every wrapper function (maybe excluding any which are documented never to throw exceptions) - if you're translating them to panic in go then potentially every wrapper function would have a call to _swig_gopanic.

Do you think anyone in the maintainer community would be able to solve this?

Ian's outlined some solutions to the problem I hit with the lifetime of the panic method. I don't really know any Go but I can try to improve the patch here based on those. Not sure when I'll get to it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_swig_gopanic can cause memory leak
3 participants