`nn.RNN(...).to('cuda')` fails with `cuDNN error: CUDNN_STATUS_BAD_PARAM` on GPU, but works on CPU

### 🐛 Describe the bug

I’d like to report an issue where a simple `nn.RNN` model runs correctly on CPU but fails on CUDA with a `cuDNN_STATUS_BAD_PARAM` error during model transfer `(.to('cuda'))`. This suggests a problem with cuDNN parameter initialization during `flatten_parameters()`.

## Minimal Reproduction
```python
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(MyModel, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])
        return out

def my_model_function():
    return MyModel(input_size=10, hidden_size=20, num_layers=2, output_size=5)

def GetInput():
    return torch.randn(4, 8, 10)

if __name__ == "__main__":
    # Runs fine on CPU
    model = my_model_function().to("cpu")
    input_tensor = GetInput().to("cpu")
    output = model(input_tensor)
    print(output.shape)
    print("CPU output ok!")

    # Fails on GPU
    cuda_model = my_model_function().to("cuda")
    cuda_input = GetInput().to("cuda")
    cuda_output = cuda_model(cuda_input)
    print(cuda_output.shape)
    print("GPU output ok!")
```

## Error Trace (sanitized)
```
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
  File ".../torch/nn/modules/rnn.py", line 271, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(...)
  File ".../torch/nn/modules/rnn.py", line 215, in _init_flat_weights
    self.flatten_parameters()
  File ".../torch/nn/modules/rnn.py", line 290, in _apply
    self._init_flat_weights()
  File ".../torch/nn/modules/module.py", line 915, in _apply
    module._apply(fn)
  File ".../torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
```

## Observations
+ The model uses only standard PyTorch modules (`nn.RNN` and `nn.Linear`).

+ Model runs fine on CPU.

+ Fails immediately upon `to('cuda')`, during the `RNN` weight flattening for `cuDNN`.

+ Error does not depend on input data or forward pass—it happens during `.to("cuda")`.



### Versions

<details> <summary>Click to expand log</summary>

```
PyTorch version: 2.7.1a0+gite2d141d
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

torch.backends.cudnn.version(): 8907
```

</details>

cc @csarofeen @ptrblck @xwang233 @eqy @msaroufim @jerryzh168

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`nn.RNN(...).to('cuda')` fails with `cuDNN error: CUDNN_STATUS_BAD_PARAM` on GPU, but works on CPU #155798

🐛 Describe the bug

Minimal Reproduction

Error Trace (sanitized)

Observations

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nn.RNN(...).to('cuda') fails with cuDNN error: CUDNN_STATUS_BAD_PARAM on GPU, but works on CPU #155798

Description

🐛 Describe the bug

Minimal Reproduction

Error Trace (sanitized)

Observations

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`nn.RNN(...).to('cuda')` fails with `cuDNN error: CUDNN_STATUS_BAD_PARAM` on GPU, but works on CPU #155798