Use official CUDAToolkit module in CMake #154595

cyyever · 2025-05-29T05:30:38Z

Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake.
Some CUDA targets are also renamed with torch:: prefix.

cc @albanD

pytorch-bot · 2025-05-29T05:30:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154595

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit ae40a3d with merge base cf4964b ():

NEW FAILURES - The following jobs have failed:

linux-binary-manywheel / manywheel-py3_11-cuda12_8-full-test / test (gh)
ImportError: libnvshmem_host.so.3: cannot open shared object file: No such file or directory
linux-binary-manywheel / manywheel-py3_14-xpu-test (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_14t-xpu-test (gh)
Process completed with exit code 1.
periodic / linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 4, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck) (gh)
inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_exhaustive_dtypes
s390x-periodic / linux-manylinux-2_28-py3-cpu-s390x / test (default, 6, 10, linux.s390x) (gh)
test_proxy_tensor.py::TestSymbolicTracing::test_constant_specialization

FLAKY - The following job failed but was likely due to flakiness present on trunk:

periodic / linux-jammy-cuda12.8-py3-gcc11-slow-gradcheck / test (default, 3, 8, linux.g5.4xlarge.nvidia.gpu, module:slowgradcheck) (gh) (similar failure)
inductor/test_benchmark_fusion.py::BenchmarkMultiTemplateFusionCudaTest::test_equivalent_template_code

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cyyever · 2025-05-29T05:38:20Z

@pytorchbot label "topic: not user facing"

albanD

Sorry for the delay on reviewing this, my review queue has been pretty backed up.
This is AMAZING!!!

The change sounds good to me (even though i'm in no way a cmake expert).
But if CI/CD is happy (including cpp extensions tests), I think we're good to go.

Let's try and land this as is!

cyyever · 2025-06-16T23:59:23Z

@pytorchbot rebase

pytorchmergebot · 2025-06-17T00:00:53Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-06-17T00:00:57Z

Successfully rebased cuda_language onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda_language && git pull --rebase)

cyyever · 2025-06-23T23:48:41Z

@ngimel Detection of the native CPU architecture could be changed to set(CUDA_ARCHITECTURES "native") which essentially passes native to nvcc. The old behavior to print the info requires CUDA_DETECT_INSTALLED_GPUS, which is unfortunately deprecated, see https://gitlab.kitware.com/cmake/cmake/-/issues/19199.

One fix is using CUDA_ARCHITECTURES, as shown in https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html .

cyyever · 2025-06-23T23:57:41Z

Concretely, this is wrong in cmake
      if(CUDA_LIMIT_GPU_ARCHITECTURE AND ITEM VERSION_GREATER_EQUAL CUDA_LIMIT_GPU_ARCHITECTURE)
        list(GET CUDA_COMMON_GPU_ARCHITECTURES -1 NEWITEM)
        string(APPEND CUDA_GPU_DETECT_OUTPUT_FILTERED " ${NEWITEM}")
      else()
        string(APPEND CUDA_GPU_DETECT_OUTPUT_FILTERED " ${ITEM}")
        endif()
as it either incorrectly sets CUDA_LIMIT_GPU_ARCHITECTURE or does an incorrect comparison here, and thus sets architecture to "CUDA_COMMON_GPU_ARCHITECTURES" I'm on cmake 3.27 but I've seen the same behavior on 4.0

Fixed, see commit 3f789e9 . Also note that it has existed before this PR but has been revealed after these changes..

ngimel · 2025-06-24T00:04:06Z

Do you know how this "native" option would work later when we are checking if the build is ok for the current GPU to give a clear error message on mismatch?

cyyever · 2025-06-24T00:11:49Z

@ngimel From nvcc documentation:

When -arch=native is specified, nvcc detects the visible GPUs on the system and generates codes for them, no PTX program will be generated for this option. It is a warning if there are no visible supported GPU on the system, and the default architecture will be used.

CMake do little work here, we rely on nvcc. (IMO they don't want to maintain these flags...)

malfet

This PR tries to do too many things in one go (including renames)
Can it be split in 2-3 PRs, one of which would be using new CUDAToolkit package, but just define all the aliases that system used to, say set(CUDA_VERSION ${CUDAToolkit_VERSION}) etc?

Or alternatively, have a baseline PR that changes those in existing FindCUDA in preparation for new package version

Looks like there are some changes to how nvrtc package is defined before/after this change. In my opinion, it would be good to keep old definitions in place rather than pushing it to custom copy scripts, that will not be executed for users if they are running it outside of CI

malfet · 2025-06-23T23:59:53Z

.ci/aarch64_linux/aarch64_wheel_ci_build.py

    os.system(f"unzip {wheel_path} -d {folder}/tmp")
    libs_to_copy = [
        "/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
+        "/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so",


Why this change is necessary if goal is just to remove FindCUDA?

Some CI jobs broke for unfound nvperf_host.so, and nvperf_host.so is indeed required by cupti.so. If we install cupti.so, we should also install nvperf_host.so.

Some CI jobs broke for unfound nvperf_host.so, and nvperf_host.so is indeed required by cupti.so

Could you link the failing jobs? I don't understand why we would need nvperf_* libs now without changing profiling usage in PyTorch or CUPTI itself. Why and how was profiling working before?
nvperf_* libs are used for pc sampling, pm sampling, sass metrics, or range profiling, and I don't see any related change in this PR so are we using these?

cyyever marked this pull request as draft May 29, 2025 05:30

pytorchbot added the open source label May 29, 2025

pytorch-bot bot added the topic: not user facing topic category label May 29, 2025

cyyever force-pushed the cuda_language branch 2 times, most recently from e67982f to c3f0359 Compare May 29, 2025 05:55

cyyever added the skip-pr-sanity-checks label May 29, 2025

cyyever force-pushed the cuda_language branch 7 times, most recently from 4037656 to 9d5da10 Compare May 29, 2025 09:34

cyyever force-pushed the cuda_language branch from 9d5da10 to 2cf1d9f Compare June 14, 2025 23:25

cyyever marked this pull request as ready for review June 14, 2025 23:35

cyyever requested review from ezyang, fmassa and soumith as code owners June 14, 2025 23:35

cyyever changed the title ~~Cuda language~~ Use CUDA language in CMake Jun 14, 2025

cyyever changed the title ~~Use CUDA language in CMake~~ Use official CUDAToolkit module in CMake Jun 14, 2025

cyyever force-pushed the cuda_language branch 2 times, most recently from ed247ae to 22bb4d5 Compare June 15, 2025 08:04

albanD previously approved these changes Jun 16, 2025

View reviewed changes

cyyever marked this pull request as draft June 16, 2025 23:59

pytorchmergebot force-pushed the cuda_language branch from 22bb4d5 to 2594e27 Compare June 17, 2025 00:00

malfet requested changes Jun 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use official CUDAToolkit module in CMake #154595

Use official CUDAToolkit module in CMake #154595

Uh oh!

cyyever commented May 29, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 29, 2025 •

edited

Loading

Uh oh!

cyyever commented May 29, 2025

Uh oh!

albanD left a comment

Uh oh!

cyyever commented Jun 16, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Uh oh!

cyyever commented Jun 23, 2025 •

edited

Loading

Uh oh!

cyyever commented Jun 23, 2025 •

edited

Loading

Uh oh!

ngimel commented Jun 24, 2025

Uh oh!

cyyever commented Jun 24, 2025 •

edited

Loading

Uh oh!

malfet left a comment •

edited

Loading

Uh oh!

malfet Jun 23, 2025

Uh oh!

cyyever Jun 24, 2025

Uh oh!

ptrblck Jun 25, 2025

Uh oh!

cyyever Jun 25, 2025 •

edited

Loading

Use official CUDAToolkit module in CMake #154595

Are you sure you want to change the base?

Use official CUDAToolkit module in CMake #154595

Uh oh!

Conversation

cyyever commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154595

❌ 5 New Failures, 1 Unrelated Failure

Uh oh!

cyyever commented May 29, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

cyyever commented Jun 16, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Uh oh!

pytorchmergebot commented Jun 17, 2025

Uh oh!

cyyever commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyyever commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Jun 24, 2025

Uh oh!

cyyever commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

malfet Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

cyyever Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

ptrblck Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

cyyever Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

cyyever commented May 29, 2025 •

edited

Loading

pytorch-bot bot commented May 29, 2025 •

edited

Loading

cyyever commented Jun 23, 2025 •

edited

Loading

cyyever commented Jun 23, 2025 •

edited

Loading

cyyever commented Jun 24, 2025 •

edited

Loading

malfet left a comment •

edited

Loading

cyyever Jun 25, 2025 •

edited

Loading