Skip to content

XPU OOM when allocate tensor according to its reported available memory #164966

@yao-matrix

Description

@yao-matrix

🐛 Describe the bug

run below

import torch

torch.xpu.empty_cache()

## bring up the context, it may occupy memory
a = torch.rand(5).to("xpu:0")

free_memory_bytes = torch.xpu.mem_get_info("xpu:0")[0]
required_memory_bytes = 5000 * 5000 * (32 // 8)

# Leaving 50 MB of free memory for possible buffers, etc.
n_vals = (free_memory_bytes - required_memory_bytes - int(50e6)) // (32 // 8)
foo = torch.rand(n_vals, device="xpu:0") 

You'll get exception as below:

Traceback (most recent call last):
File "/workspace/accelerate/./test.py", line 13, in
foo = torch.rand(n_vals, device="xpu:0")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 63.71 GiB. GPU 0 has a total capacity of 63.98 GiB. Of the allocated memory 512 bytes is allocated by PyTorch, and 2.00 MiB is reserved by PyTorch but unallocated. Please use empty_cache to release all unoccupied cached memory.

Versions

latest xpu pytorch

cc @gujinghui @EikanWang @fengyuan14 @guangyey

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: memory usagePyTorch is using more memory than it should, or it is leaking memorymodule: xpuIntel XPU related issuestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions