[CUDA][cuBLAS][TF32] Skip checking TF32 options and setting cuBLAS handle TF32 modes for other dtypes #161823

eqy · 2025-08-29T21:24:33Z

Basically #125888 appears to introduce measurable CPU overhead due to TF32 precision setting checks. Unfortunately one of the checks is in getCurrentCUDABlasHandle, which is called preceding every cuBLAS matmul (and in a few other places such as workspace setup). We don't need to do this for non-float32 matmuls so this PR is to alleviate the performance hit where it hurts the most (smaller dtypes that have faster matmuls) at the cost of some copypasta.

Some microbenchmark runs with this PR (microseconds):

7.812199214640714 
8.07804053692962 
7.865882366786536
7.898942214978888
8.018492849259928

and without

8.222943563396257
8.129948014357069
8.184711361991504
8.28010104214627
8.266569921033806

script:

import torch
import time

warmup = 128
iters = 16384

a = torch.zeros(512, 512, device='cuda', dtype=torch.bfloat16)
for _ in range(warmup):
    torch.matmul(a, a)

torch.cuda.synchronize()
t0 = time.perf_counter()
for _ in range(iters):
    torch.matmul(a, a)
torch.cuda.synchronize()
t1 = time.perf_counter()
print(f"{1e6 * (t1 - t0)/iters}")

Longer term we'd prefer making float32Precision faster (better data structures, less validation, etc.?)

cc @ptrblck @msaroufim @jerryzh168 @csarofeen @xwang233 @zasdfgbnm

pytorch-bot · 2025-08-29T21:24:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161823

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b0b4c4b with merge base 93c5112 ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-rocm-py3.10 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/hip/CublasHandlePool.cpp:354:55: error: ‘CUBLAS_DEFAULT_MATH’ was not declared in this scope; did you mean ‘HIPBLAS_DEFAULT_MATH’?

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-08-29T22:03:47Z

aten/src/ATen/cuda/CUDABlas.cpp

  // See Note [Writing Nondeterministic Operations]
  globalContext().alertCuBLASConfigNotDeterministic();
  cublasHandle_t handle = at::cuda::getCurrentCUDABlasHandle();
+  if (!NoTF32Guard::should_disable_tf32() &&


Why not just delegate this to an inline function instead of copy paste

malfet

This feels mildly wrong... I.e. replacing one check with 30+ in different places are bound to cause errors. Where overhead is coming from? String comparison? Than it should be replaced by enum

eqy · 2025-08-29T23:00:17Z

This feels mildly wrong... I.e. replacing one check with 30+ in different places are bound to cause errors. Where overhead is coming from? String comparison? Than it should be replaced by enum

I brought this up in the original PR, there's multiple reasons string comparison is used rather than enum, but it's not that simple either as there are multiple levels of string comparison e.g., for backend and for precision #125888
along with excessive validation checks and reference chasing

I think with the changes to inline it would be cleaner.

ngimel · 2025-08-31T03:34:21Z

Should we revert #125888 until we come up with better design? It seems like it created a lot of problems without solving any

malfet · 2025-09-01T14:08:42Z

Should we revert #125888 until we come up with better design? It seems like it created a lot of problems without solving any

I think reverting is tricky, as we've technically released 2.8 with it, but I agree, may be it's better to just revert.
@albanD : what do you think?

ngimel · 2025-09-01T21:23:13Z

Also I've looked at discussion on #125888 and it seems there were no serious arguments for strings - they want strings at python level, fine, but it's no reason to have string comparison for every cublas call

eqy added 2 commits August 29, 2025 19:41

check in

4401874

check in

53fd05a

eqy requested a review from syed-ahmed as a code owner August 29, 2025 21:24

eqy added module: cuda Related to torch.cuda, and CUDA support in general module: cublas Problem related to cublas support open source module: tf32 Related to tf32 data format topic: not user facing topic category labels Aug 29, 2025

Skylion007 reviewed Aug 29, 2025

View reviewed changes

malfet requested changes Aug 29, 2025

View reviewed changes

move to function

b0b4c4b

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][cuBLAS][TF32] Skip checking TF32 options and setting cuBLAS handle TF32 modes for other dtypes #161823

[CUDA][cuBLAS][TF32] Skip checking TF32 options and setting cuBLAS handle TF32 modes for other dtypes #161823

Uh oh!

eqy commented Aug 29, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

Skylion007 Aug 29, 2025

Uh oh!

malfet left a comment

Uh oh!

eqy commented Aug 29, 2025 •

edited

Loading

Uh oh!

ngimel commented Aug 31, 2025

Uh oh!

malfet commented Sep 1, 2025

Uh oh!

ngimel commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[CUDA][cuBLAS][TF32] Skip checking TF32 options and setting cuBLAS handle TF32 modes for other dtypes #161823

Are you sure you want to change the base?

[CUDA][cuBLAS][TF32] Skip checking TF32 options and setting cuBLAS handle TF32 modes for other dtypes #161823

Uh oh!

Conversation

eqy commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161823

❌ 1 New Failure

Uh oh!

Skylion007 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

eqy commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Aug 31, 2025

Uh oh!

malfet commented Sep 1, 2025

Uh oh!

ngimel commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eqy commented Aug 29, 2025 •

edited

Loading

pytorch-bot bot commented Aug 29, 2025 •

edited

Loading

eqy commented Aug 29, 2025 •

edited

Loading