-
Notifications
You must be signed in to change notification settings - Fork 3.2k
leak checker: explain why a tracer is alive by printing a reference chain #13022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leak checker: explain why a tracer is alive by printing a reference chain #13022
Conversation
fe6d68c
to
69c696f
Compare
69c696f
to
b26ff93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! cool error message
getattr(parents(parent)[0], '__dict__', None) is parents(child)[0]): | ||
parent = parents(parent)[0] | ||
elif type(parent) is cell_type: | ||
parent = parents(parents(parent)[0])[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment why this is needed for closures? it skips the closure and goes to the function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea: a little shortcut. Without this we get this printed:
<DynamicJaxprTracer 139764213674128> is referred to by <cell 139764213646192>
<cell 139764213646192> is referred to by <tuple 139764213646096>[0]
<tuple 139764213646096> is referred to by <function 139764213640064>
<function 139764213640064> is referred to by <dict 139766096500800>['hi']
<dict 139766096500800> is referred to by <A 139766089094432>.dct
<A 139766089094432> is referred to by __main__.a
but with it we get:
<DynamicJaxprTracer 140139354034480> is referred to by <function 140139405110576> (foo) closed-over variable x
<function 140139405110576> is referred to by <dict 140141242837056>['hi']
<dict 140141242837056> is referred to by <A 140141237605664>.dct
<A 140141237605664> is referred to by __main__.a
I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# For namespaces (like modules and class instances) and closures, the
# references may form a simple chain: e.g. instance refers to its own
# __dict__ which refers to child, or function refers to its __closure__
# which refers to cells which refer to child. In these cases, we can provide
# a more intuitive description by collapsing the chain into a single
# parent->child jump. We do that by setting `parent` here to be a
# grandparent (or great-grandparent) of `child`, and then handling that case
# in _why_alive_container_info.
# To prevent this collapsing behavior, just comment out this code block.
elif type(parent) is cell_type: | ||
parent = parents(parents(parent)[0])[0] | ||
|
||
line = f'<{type(child).__name__} {id(child)}> is referred to by ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my understanding: why is the object ID useful when debugging? IIUC there will be no cycles in this trace, or might there be cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be cycles in the printed result.
I found printing the id useful because I tend to cross-reference ids against print statements I put elsewhere or against interactive pdb spelunking. Also it's a (meaningful) way to distinguish different instances of e.g. a dict
.
67e27f3
to
7178528
Compare
Co-authored-by: Qiao Zhang <[email protected]> Co-authored-by: Roy Frostig <[email protected]>
7178528
to
6ebf44a
Compare
The idea here is that if you know what data structure(s) a leaked tracer is in, it might help you understand what the side-effect was that caused the problem.
With this code:
instead of just this error:
we now get this error:
That is, when the leak checker finds (via the
gc
) aTracer
which has outlived its trace, we attempt to walk back up the referrer chain for thatTracer
and describe what objects we encounter (and even at which indices/names). In the example code, that might make us notice that the global object, and our update todct
, is a problemWe should also add info about which transformation or primitive the
Tracer
corresponded to (i.e. point to the last line of the file here). That may be in a follow-up PR though.