-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Open
Labels
module: correctness (silent)issue that returns an incorrect result silentlyissue that returns an incorrect result silentlymodule: cpp-extensionsRelated to torch.utils.cpp_extensionRelated to torch.utils.cpp_extensionneeds reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.Ensure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
Steps to reproduce:
$ uv venv /raid/youkaichao/uv_envs/tmp_test_cudart --python 3.12 --seed
$ source /raid/youkaichao/uv_envs/tmp_test_cudart/bin/activate
$ uv pip install torch==2.9.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
$ export LIBTORCH=/raid/youkaichao/uv_envs/tmp_test_cudart/lib/python3.12/site-packages/torch
$ export CUDA_PATH=/usr/local/cuda-13.0/
$ export LD_LIBRARY_PATH=${CUDA_PATH}/lib64:${LIBTORCH}/lib:$LD_LIBRARY_PATH
$ {CUDA_PATH}/bin/nvcc sm_query.cu -std=c++17 \
-I${LIBTORCH}/include \
-I${LIBTORCH}/include/torch/csrc/api/include \
-L${LIBTORCH}/lib \
-ltorch_cuda -lc10_cuda -ltorch_cpu -lc10 \
-L${CUDA_PATH}/lib64 -lcudart \
-o sm_query
The sm_query.cu
file:
#include <iostream>
#include <ATen/cuda/CUDAContext.h>
int main() {
if (!at::cuda::is_available()) {
std::cerr << "CUDA is not available.\n";
return 1;
}
const cudaDeviceProp* prop = at::cuda::getCurrentDeviceProperties();
std::cout << "Device: " << prop->name << '\n';
std::cout << "Streaming Multiprocessors (SM): "
<< prop->multiProcessorCount << '\n';
return 0;
}
Run the binary:
$ ./sm_query
Device: NVIDIA B200
Streaming Multiprocessors (SM): 1
The SM number should be 148.
Diving deeper, I find that this is because the torch links libcudart.so.12
, and the nvcc
from cuda 13.0 finds libcudart.so.13
. Two libraries seem to conflict with each other.
I get an ld warning too:
/usr/bin/ld: warning: libcudart.so.12, needed by /raid/youkaichao/uv_envs/tmp_test_cudart/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so, may conflict with libcudart.so.13
Versions
pytorch 2.9 rc1
Metadata
Metadata
Assignees
Labels
module: correctness (silent)issue that returns an incorrect result silentlyissue that returns an incorrect result silentlymodule: cpp-extensionsRelated to torch.utils.cpp_extensionRelated to torch.utils.cpp_extensionneeds reproductionEnsure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.Ensure you have actionable steps to reproduce the issue. Someone else needs to confirm the repro.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
No status