Skip to content

Memory didn't be freed after terminating with "torch.OutOfMemoryError: XPU out of memory." #851

@nacui-intel

Description

@nacui-intel

Describe the bug

CPU: LNL Intel(R) Core(TM) Ultra 5 228V
memory: 32GB
machine: ASUS-EXPERTBOOK-P5405CSA-PX485CSA
torch version: torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 from https://download.pytorch.org/whl/xpu
ipex version: intel-extension-for-pytorch==2.8.10+xpu oneccl_bind_pt==2.8.0+xpu from https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

I ran the inference with model vjepa2_vit_large optimized by ipex on igpu. But it terminated with log in the following, though the total memory of this machine is 32GB. And about 15GB memory was still in use after the termination. While this test can run on cpu well.

torch.OutOfMemoryError: XPU out of memory. Tried to allocate 4.00 GiB. GPU 0 has a total capacity of 13.74 GiB. Of the allocated memory 10.10 GiB is allocated by PyTorch, and 3.47 GiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.

The original model can refer to https://github.com/facebookresearch/vjepa2.

Sample code:

    # Check device availability
    use_xpu = check_xpu_availability()
    device = 'xpu' if use_xpu else 'cpu'
    print(f"Using device: {device}")

    # Load V-JEPA2 model
    encoder, predictor = vjepa2_vit_large(pretrained=True)
    
    # Use the encoder
    model = encoder
    model .eval()

    # Create sample video data for V-JEPA2
    # V-JEPA2 expects input shape: [batch_size, channels, frames, height, width]
    data = torch.rand(1, 3, 64, 256, 256)

    # Move model and data to Intel GPU if available
    if use_xpu:
        print("Moving model and data to Intel XPU...")
        model = model.to('xpu')
        data = data.to('xpu')
        model = ipex.optimize(model, dtype=torch.float32)
    else:
        print("Applying Intel Extension for PyTorch optimizations...")
        model = ipex.optimize(model)
    out = model(data)

Versions

collect_env.py failed with logs:

Collecting environment information...
Traceback (most recent call last):
  File "/home/collect_env.py", line 616, in <module>
    main()
  File "/home/collect_env.py", line 610, in main
    output = get_pretty_env_info()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/collect_env.py", line 605, in get_pretty_env_info
    return pretty_str(get_env_info())
                      ^^^^^^^^^^^^^^
  File "/home/collect_env.py", line 449, in get_env_info
    pyenv, pip_list_output = get_python_packages(run_lambda)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/collect_env.py", line 442, in get_python_packages
    pkgs_filtered = filter_python_packages(out)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/collect_env.py", line 407, in filter_python_packages
    for line in data.splitlines()
                ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'splitlines'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions