-
Notifications
You must be signed in to change notification settings - Fork 6.2k
mgr: load modules in separate python sub-interpreters #14971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
One notable known problem: if a module fails to load because it can't be found (e.g. sys.path is wrong, which shouldn't happen in real life, or a module name is mis-typed in ceph.conf, which might), the python bit of mgr locks up somehow. The loop in PyModules::init() just kinda stops somehow. |
c4cd997
to
8b5ceca
Compare
(That last was just a rebase) |
8b5ceca
to
f6a135a
Compare
OK, I'm confident in this one now. Each MgrPyModule gets its own python sub-interpreter. The logger, the ceph_state module and sys.path need to be set up separately for each sub-interpreter, so all that happens in MgrPyModule's constructor (previously this was done on the main interpreter in PyModules::init()). Each sub-interpreter has its own python thread state. The main interpreter also has a python thread state, but that is almost unused except for during setup and teardown of the whole beast. Some care needs to be taken to ensure that the right thread state is active at the right time; note how the call to handle_pyerror() in PyModules::init() had to be moved inside MgrPyModle::load(). It's not too bad though, just remember that PyModules should be using the main thread state, and MgrPyModule should always be using its sub-interpreter thread state, except for very early in its constructor and very late in its destructor. The ceph_state module (PyState.cc) naturally has no idea what context it's being run in, so uses PyThreadState_Get() when it needs to know the python thread state. I've also added a little Gil class to make it easy to acquire and release the GIL for whichever thread state you're in. |
Dammit, I just tried using mgr built with this changeset to load the restful module from #14457. It loads fine, but segfaults on shutdown... |
This is a tough one to review because the subinterpreter functionality in python itself is so obscure. It would be nice to spin the Gil class out into a separate commit. Since PyFinalize allegedly (https://docs.python.org/2.7/c-api/init.html#c.Py_EndInterpreter) will clean up all un-terminated sub interpreters, I wonder if the stuff in ~MgrPyModule could just go away -- it doesn't seem that useful to maybe terminate sub interpreters (obviously not your fault, quirk of the framework). When taking http://tracker.ceph.com/issues/19549 into account, we never truly need to do completely clean teardown, it just needs to be torn down far enough for the process to end cleanly. |
Ok, I'll split this up into separate commits. And yeah, I wondered the same about getting rid of the attempted termination in ~MgrPyModule, given the quirks. Now that there's two of us, I'll do that as well (but leave some sort of comment). |
Splitting out the Gil change was a really good idea. Turns out that change by itself exhibited the segfault on shutdown, even without the sub-interpreters, and now I've learned even more than I ever wanted to know about embedding Python :-) Here's what I was missing: not only do we need separate sub-interpreters, we need to manually create an additional python thread state for each OS thread that runs the module's serve function. The PyGILState_*() APIs did that bit automatically, but because I removed those (as they're not supported with sub-interpreters), that extra thread state creation wasn't happening, which resulted in that weird segfault. |
f3b82bc
to
1f9b93d
Compare
OK, that's better. No weird segfaults. I think I'll avoid using the word "confident" this time and instead go with cautious optimism (and flashbacks to when I drew this thing when I first started working on HA). My one outstanding question now, I think, is: are there any other OS threads that need another manually created python thread state? |
1f9b93d
to
f21ba89
Compare
I've moved the additional thread state creation inside the Gil class, but the caller has to pass new_thread == true when we know we're in a new OS thread . IMO it'd be better to do this automatically, which we can do, if we don't mind using something that is theoretically a private member of PyThreadState (see the comments in Gil.h for details). What do you think @jcsp? |
src/mgr/Gil.h
Outdated
: pThreadState(ts), pNewThreadState(nullptr) | ||
{ | ||
// Acquire the GIL, set the current thread state | ||
PyEval_RestoreThread(pThreadState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if pThreadState (ts) is nullptr? The documentation for PyEvalRestoreThread() says that it must not be called on a NULL pointer. Similarly, as I'm reading it, before we do this we have to be sure that we haven't already been called: what happens if this ctor is invoked twice?
https://docs.python.org/2/c-api/init.html#c.PyEval_RestoreThread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pThreadState is null it probably dies horribly. I've added an assert (so it'll still die horribly, but on our terms).
As for ensuring we haven't already been called, that's (sadly) up to the caller to be aware of. I've updated the comment to hopefully clarify this a bit more. If you invoke this ctor twice in the same thread it'll deadlock, e.g. do not do this:
void func1() {
Gil gil(pMyThreadState);
// Do some stuff that needs the GIL here
func2(); // this will deadlock
}
void func2() {
Gil gil(pMyThreadState);
// Do some stuff that needs the GIL
}
You can avoid this deadlock like so:
void func1() {
{ // extra scope
Gil gil(pMyThreadState);
// Do some stuff that needs the GIL here
}
func2(); // this won't deadlock
}
In summary, you have to be just as careful using this class as you need to be when using PyEvalRestoreThread(). AFAICT there's no public API for figuring out whether or not the GIL is already held.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
// something that's meant to be a black box. | ||
// | ||
if (new_thread) { | ||
pNewThreadState = PyThreadState_New(pThreadState->interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the old state need to be released in some way before re-assigning pNewThreadState?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. At this point, pNewThreadState is NULL, and PyThreadState_New creates a new PyThreadState attached to the same interpreter that the existing pThreadState is attached to. It's fine for the two PyThreadState objects to exist at the same time; only one is ever active (the PyThreadState_Swap() calls activate one or the other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're now sure pNewThreadState is nullptr, this sounds good. Thanks!
src/mgr/Gil.h
Outdated
Gil(PyThreadState *ts, bool new_thread = false) | ||
: pThreadState(ts), pNewThreadState(nullptr) | ||
{ | ||
assert(pThreadState != nullptr); |
This provides a reasonable amount of isolation between mgr
modules. Notably, with this change, it's possible to have more
than one mgr module use cherrypy.
I consider this implementation to be a bit messy; I think it'd
be neater if the sub-interpreter was created inside MgrPyModule,
rather than in PyModules::init(), but see the comment in that
function for more details.
Also, due to the lack of documentation on python sub-interpreters,
I'm not completely confident that I'm doing everything correctly.
Notably, there's still calls to PyGILState_Ensure() and
PyEval_ReleaseThread() in PyState.cc, and apparently those two
are either not-compatible with sub-interpreters, or at least need
to be handled delicately.
TL;DR: Everything seems to work for me so far, but should be
considered experimental until further notice.
Signed-off-by: Tim Serong [email protected]