Refs #16022 - Use a weak reference for file field memory optimization #15592

massover · 2022-04-13T15:28:32Z

I believe that the tests should all pass. However, I have added a test to my example project that fails. I believe this is breaking if the weak ref is lost while trying to use FileField.save. The documented api itself is cyclic which is going to be tricky.

Let me show the performance difference I observed.

before:

(the drop is a new deployment)

after:

(the drop is a new deployment)

For my application, the memory utilization and latency saw such huge improvements. The memory utilization is evident in the graphs. The latency improvements come mostly in P99 (requests that are unlucky to catch gc). As @carltongibson notes, it's not a leak per se. It is eventually garbage collected. However, in my application, we were building up so much gen 2 garbage that when gc eventually starts, the latency takes a huge hit. With no reference cycle, there is less garbage.

Since this seems like a breaking change as far as I can see, I would hope that some other people can try this out and see if it makes a noticeable difference in their application (as it did for mine). Here is some code that should patch:

def patch_field_file():
    import weakref
    from django.db.models.fields.files import FieldFile

    def __init__(self, instance, field, name):
        super(FieldFile, self).__init__(None, name)
        self.instance = weakref.proxy(instance)
        self.field = field
        self.storage = field.storage
        self._committed = True

    FieldFile.__init__ = __init__

The way to verify it would be using your existing memory metrics (!) or using something like psutil in code (such as a middleware) and printing it.

class psutilMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
        self.process = psutil.Process(os.getpid())
        self.memory_start = process.memory_full_info().rss / 1024 ** 2

    def __call__(self, request):
        response = self.get_response(request)
        memory_end = process.memory_full_info().rss / 1024 ** 2
        memory_diff = memory_end - self.memory_start
        print(f"{self.memory_start=}, {memory_end=},  {memory_diff=}")
        return response

I get that we all choose python and make tradeoffs on performance in this interpreted garbage collected language. For this improvement, I'm looking for data next, not to imply that because it made a huge impact on my project that it's the way to go. It would be nice if someone else could give their input on a real project! As an example, I believe that something like the following could be affected:

A User model and api endpoint with an "avatar" file field. If an instance of user has lots of other data on it, I believe that every get request to api/user/:id/ would be creating lots of garbage, which could be increasing request times and worker memory usage non trivially.

If the api can't break, and we'd like to move forward, here is a suggestion

descriptor based compatibility flag
global settings compatibility flag

class FieldFile(File):
    def __init__(self, instance, field, name):
        super().__init__(None, name)
        
        if field.weak:
           self.instance = weakref.proxy(instance)
        elif settings.FILE_FIELD_WEAKREF_COMPAT:
           self.instance =  weakref.proxy(instance)
        else:
           self.instance = instance     
        self.field = field
        self.storage = field.storage
        self._committed = True
        
 # where we can initialize a file field like:
 
 class Example(models.Model):
     f = models.FileField(weak=True)

carltongibson · 2022-06-22T13:05:37Z

Hi @massover — Thanks again for this: good stuff.

However, I have added a test to my example project that fails. I believe this is breaking if the weak ref is lost while trying to use FileField.save. The documented api itself is cyclic which is going to be tricky.

So — yes this is the tricky bit — if you have only the reference to the FieldFile in scope then both the save() and delete() will be broken, since both reach back to save the instance by default. This is documented and is presumably the intended usage, since otherwise we'd just have save the instance after updating the file, which sounds clunky.

It's an open question as to how often folks would hit this: i.e. in real-life is the instance normally still in scope, and so still live, so this problem wouldn't come up? But I think a decent error message in each case, and tests on that would be a good starting point.

I like the idea of allowing the option to use a weakref on FileFiled here.

Without new API (but with new docs...) we could subclass FileFiled and set attr_class to a (say) WeakFieldFile — but it may be your approach is cleaner. I'd like to get other opinions before we settle — I think it depends on what, if any, changes are needed in save() and delete()... (I did consider using ref() and then having a @property instance to deference that, but likely proxy is sufficient.)

In summary:

I think there's a clear case to offer the weak option.
Maybe there's a stronger case to make it the default.
And if so, maybe we could deprecate the non-weak version.

Both the latter two need wider discussion/agreement I think.

Are you happy to push forward on the first part at least?

Thanks again! 🏅

massover · 2022-06-24T19:18:19Z

Hi @massover — Thanks again for this: good stuff.

However, I have added a test to my example project that fails. I believe this is breaking if the weak ref is lost while trying to use FileField.save. The documented api itself is cyclic which is going to be tricky.

So — yes this is the tricky bit — if you have only the reference to the FieldFile in scope then both the save() and delete() will be broken, since both reach back to save the instance by default. This is documented and is presumably the intended usage, since otherwise we'd just have save the instance after updating the file, which sounds clunky.

It's an open question as to how often folks would hit this: i.e. in real-life is the instance normally still in scope, and so still live, so this problem wouldn't come up? But I think a decent error message in each case, and tests on that would be a good starting point.

I like the idea of allowing the option to use a weakref on FileFiled here.

Without new API (but with new docs...) we could subclass FileFiled and set attr_class to a (say) WeakFieldFile — but it may be your approach is cleaner. I'd like to get other opinions before we settle — I think it depends on what, if any, changes are needed in save() and delete()... (I did consider using ref() and then having a @property instance to deference that, but likely proxy is sufficient.)

In summary:

I think there's a clear case to offer the weak option.

Maybe there's a stronger case to make it the default.

And if so, maybe we could deprecate the non-weak version.

Both the latter two need wider discussion/agreement I think.

Are you happy to push forward on the first part at least?

Thanks again! 🏅

Yes I can push forward on the first part. Thank you for the review!

massover · 2022-06-29T02:25:26Z

@carltongibson This pr now includes a FileField with a weak kwarg.

documentation
should we raise a custom exception or rely on the built in one?

======================================================================
ERROR: test_weak_file_field_save_out_of_scope (model_fields.test_filefield.FileFieldTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/massover/Personal/django/tests/model_fields/test_filefield.py", line 204, in test_weak_file_field_save_out_of_scope
    weakfile.save("new-name", new)
  File "/Users/massover/Personal/django/django/db/models/fields/files.py", line 97, in save
    setattr(self.instance, self.field.attname, self.name)
ReferenceError: weakly-referenced object no longer exists

Lastly, I updated the little example project with a command that shows the data sizes where the reference cycle starts to matter

1kb, memory_start=144.12109375MB, memory_end=144.8359375MB,  memory_diff=0.71484375MB
10kb, memory_start=144.83984375MB, memory_end=145.79296875MB,  memory_diff=0.953125MB
100kb, memory_start=145.89453125MB, memory_end=150.484375MB,  memory_diff=4.58984375MB
1mb, memory_start=148.53125MB, memory_end=203.13671875MB,  memory_diff=54.60546875MB
10mb, memory_start=183.9921875MB, memory_end=724.546875MB,  memory_diff=540.5546875MB
100mb, memory_start=534.40234375MB, memory_end=4619.51171875MB,  memory_diff=4085.109375MB

djantzen · 2024-07-11T17:52:49Z

I've encountered this problem, I think. In my scenario, I see extremely high memory consumption -- something like 10 to 25 times the memory used compared to the original file size -- but only when our logging middleware from django-request-logging is in play. If I remove the middleware entirely, then memory consumption is about 1.5x file size. If I disable logging on the upload view method using the @no_logging decorator, memory consumption is "only" about 5x file size. I'm going to try the weakref patching strategy next.

Use a weak reference for memory optimization

6821ab5

massover changed the title ~~Refs #16022 - Use a weak reference for memory optimization~~ Refs #16022 - Use a weak reference for file field memory optimization Apr 13, 2022

carltongibson requested a review from a team April 13, 2022 15:33

carltongibson removed the request for review from a team June 22, 2022 13:09

Add weak option to FileField

01f4ef5

massover force-pushed the issues/16022 branch from 0e2329a to 01f4ef5 Compare June 29, 2022 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refs #16022 - Use a weak reference for file field memory optimization #15592

Refs #16022 - Use a weak reference for file field memory optimization #15592

massover commented Apr 13, 2022 •

edited

Loading

Uh oh!

carltongibson commented Jun 22, 2022 •

edited

Loading

Uh oh!

massover commented Jun 24, 2022

Uh oh!

massover commented Jun 29, 2022

Uh oh!

djantzen commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Refs #16022 - Use a weak reference for file field memory optimization #15592

Are you sure you want to change the base?

Refs #16022 - Use a weak reference for file field memory optimization #15592

Conversation

massover commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carltongibson commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

massover commented Jun 24, 2022

Uh oh!

massover commented Jun 29, 2022

Uh oh!

djantzen commented Jul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

massover commented Apr 13, 2022 •

edited

Loading

carltongibson commented Jun 22, 2022 •

edited

Loading