Skip to content

Conversation

cyyever
Copy link
Collaborator

@cyyever cyyever commented May 29, 2025

Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake.
Some CUDA targets are also renamed with torch:: prefix.

cc @albanD

Copy link

pytorch-bot bot commented May 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154595

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit ae40a3d with merge base cf4964b (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cyyever cyyever marked this pull request as draft May 29, 2025 05:30
@cyyever
Copy link
Collaborator Author

cyyever commented May 29, 2025

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label May 29, 2025
@cyyever cyyever force-pushed the cuda_language branch 2 times, most recently from e67982f to c3f0359 Compare May 29, 2025 05:55
@cyyever cyyever force-pushed the cuda_language branch 7 times, most recently from 4037656 to 9d5da10 Compare May 29, 2025 09:34
@cyyever cyyever marked this pull request as ready for review June 14, 2025 23:35
@cyyever cyyever changed the title Cuda language Use CUDA language in CMake Jun 14, 2025
@cyyever cyyever changed the title Use CUDA language in CMake Use official CUDAToolkit module in CMake Jun 14, 2025
@cyyever cyyever force-pushed the cuda_language branch 2 times, most recently from ed247ae to 22bb4d5 Compare June 15, 2025 08:04
albanD
albanD previously approved these changes Jun 16, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on reviewing this, my review queue has been pretty backed up.
This is AMAZING!!!

The change sounds good to me (even though i'm in no way a cmake expert).
But if CI/CD is happy (including cpp extensions tests), I think we're good to go.

Let's try and land this as is!

@cyyever cyyever marked this pull request as draft June 16, 2025 23:59
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 16, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda_language onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda_language && git pull --rebase)

@cyyever
Copy link
Collaborator Author

cyyever commented Jun 23, 2025

@ngimel Detection of the native CPU architecture could be changed to set(CUDA_ARCHITECTURES "native") which essentially passes native to nvcc. The old behavior to print the info requires CUDA_DETECT_INSTALLED_GPUS, which is unfortunately deprecated, see https://gitlab.kitware.com/cmake/cmake/-/issues/19199.

One fix is using CUDA_ARCHITECTURES, as shown in https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html .

@cyyever
Copy link
Collaborator Author

cyyever commented Jun 23, 2025

Concretely, this is wrong in cmake

      if(CUDA_LIMIT_GPU_ARCHITECTURE AND ITEM VERSION_GREATER_EQUAL CUDA_LIMIT_GPU_ARCHITECTURE)
        list(GET CUDA_COMMON_GPU_ARCHITECTURES -1 NEWITEM)
        string(APPEND CUDA_GPU_DETECT_OUTPUT_FILTERED " ${NEWITEM}")
      else()
        string(APPEND CUDA_GPU_DETECT_OUTPUT_FILTERED " ${ITEM}")
        endif()

as it either incorrectly sets CUDA_LIMIT_GPU_ARCHITECTURE or does an incorrect comparison here, and thus sets architecture to "CUDA_COMMON_GPU_ARCHITECTURES" I'm on cmake 3.27 but I've seen the same behavior on 4.0

Fixed, see commit 3f789e9 . Also note that it has existed before this PR but has been revealed after these changes..

@ngimel
Copy link
Collaborator

ngimel commented Jun 24, 2025

Do you know how this "native" option would work later when we are checking if the build is ok for the current GPU to give a clear error message on mismatch?

@cyyever
Copy link
Collaborator Author

cyyever commented Jun 24, 2025

@ngimel From nvcc documentation:

When -arch=native is specified, nvcc detects the visible GPUs on the system and generates codes for them, no PTX program will be generated for this option. It is a warning if there are no visible supported GPU on the system, and the default architecture will be used.

CMake do little work here, we rely on nvcc. (IMO they don't want to maintain these flags...)

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to do too many things in one go (including renames)
Can it be split in 2-3 PRs, one of which would be using new CUDAToolkit package, but just define all the aliases that system used to, say set(CUDA_VERSION ${CUDAToolkit_VERSION}) etc?

Or alternatively, have a baseline PR that changes those in existing FindCUDA in preparation for new package version

Looks like there are some changes to how nvrtc package is defined before/after this change. In my opinion, it would be good to keep old definitions in place rather than pushing it to custom copy scripts, that will not be executed for users if they are running it outside of CI

os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change is necessary if goal is just to remove FindCUDA?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some CI jobs broke for unfound nvperf_host.so, and nvperf_host.so is indeed required by cupti.so. If we install cupti.so, we should also install nvperf_host.so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some CI jobs broke for unfound nvperf_host.so, and nvperf_host.so is indeed required by cupti.so

Could you link the failing jobs? I don't understand why we would need nvperf_* libs now without changing profiling usage in PyTorch or CUPTI itself. Why and how was profiling working before?
nvperf_* libs are used for pc sampling, pm sampling, sass metrics, or range profiling, and I don't see any related change in this PR so are we using these?

Copy link
Collaborator Author

@cyyever cyyever Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.