-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Open
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: cudnnRelated to torch.backends.cudnn, and CuDNN supportRelated to torch.backends.cudnn, and CuDNN supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
I’d like to report an issue where a simple nn.RNN
model runs correctly on CPU but fails on CUDA with a cuDNN_STATUS_BAD_PARAM
error during model transfer (.to('cuda'))
. This suggests a problem with cuDNN parameter initialization during flatten_parameters()
.
Minimal Reproduction
import torch
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size):
super(MyModel, self).__init__()
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
out, _ = self.rnn(x)
out = self.fc(out[:, -1, :])
return out
def my_model_function():
return MyModel(input_size=10, hidden_size=20, num_layers=2, output_size=5)
def GetInput():
return torch.randn(4, 8, 10)
if __name__ == "__main__":
# Runs fine on CPU
model = my_model_function().to("cpu")
input_tensor = GetInput().to("cpu")
output = model(input_tensor)
print(output.shape)
print("CPU output ok!")
# Fails on GPU
cuda_model = my_model_function().to("cuda")
cuda_input = GetInput().to("cuda")
cuda_output = cuda_model(cuda_input)
print(cuda_output.shape)
print("GPU output ok!")
Error Trace (sanitized)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
File ".../torch/nn/modules/rnn.py", line 271, in flatten_parameters
torch._cudnn_rnn_flatten_weight(...)
File ".../torch/nn/modules/rnn.py", line 215, in _init_flat_weights
self.flatten_parameters()
File ".../torch/nn/modules/rnn.py", line 290, in _apply
self._init_flat_weights()
File ".../torch/nn/modules/module.py", line 915, in _apply
module._apply(fn)
File ".../torch/nn/modules/module.py", line 1355, in to
return self._apply(convert)
Observations
-
The model uses only standard PyTorch modules (
nn.RNN
andnn.Linear
). -
Model runs fine on CPU.
-
Fails immediately upon
to('cuda')
, during theRNN
weight flattening forcuDNN
. -
Error does not depend on input data or forward pass—it happens during
.to("cuda")
.
Versions
Click to expand log
PyTorch version: 2.7.1a0+gite2d141d
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35
Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True
torch.backends.cudnn.version(): 8907
cc @csarofeen @ptrblck @xwang233 @eqy @msaroufim @jerryzh168
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: cudnnRelated to torch.backends.cudnn, and CuDNN supportRelated to torch.backends.cudnn, and CuDNN supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module