Skip to content

nn.RNN(...).to('cuda') fails with cuDNN error: CUDNN_STATUS_BAD_PARAM on GPU, but works on CPU #155798

@tinywisdom

Description

@tinywisdom

🐛 Describe the bug

I’d like to report an issue where a simple nn.RNN model runs correctly on CPU but fails on CUDA with a cuDNN_STATUS_BAD_PARAM error during model transfer (.to('cuda')). This suggests a problem with cuDNN parameter initialization during flatten_parameters().

Minimal Reproduction

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(MyModel, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])
        return out

def my_model_function():
    return MyModel(input_size=10, hidden_size=20, num_layers=2, output_size=5)

def GetInput():
    return torch.randn(4, 8, 10)

if __name__ == "__main__":
    # Runs fine on CPU
    model = my_model_function().to("cpu")
    input_tensor = GetInput().to("cpu")
    output = model(input_tensor)
    print(output.shape)
    print("CPU output ok!")

    # Fails on GPU
    cuda_model = my_model_function().to("cuda")
    cuda_input = GetInput().to("cuda")
    cuda_output = cuda_model(cuda_input)
    print(cuda_output.shape)
    print("GPU output ok!")

Error Trace (sanitized)

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  File ".../torch/nn/modules/rnn.py", line 271, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(...)
  File ".../torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File ".../torch/nn/modules/rnn.py", line 290, in _apply
    self._init_flat_weights()
  File ".../torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File ".../torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)

Observations

  • The model uses only standard PyTorch modules (nn.RNN and nn.Linear).

  • Model runs fine on CPU.

  • Fails immediately upon to('cuda'), during the RNN weight flattening for cuDNN.

  • Error does not depend on input data or forward pass—it happens during .to("cuda").

Versions

Click to expand log
PyTorch version: 2.7.1a0+gite2d141d
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

torch.backends.cudnn.version(): 8907

cc @csarofeen @ptrblck @xwang233 @eqy @msaroufim @jerryzh168

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: cudnnRelated to torch.backends.cudnn, and CuDNN supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions