Build vLLM nightly wheels for CUDA 13.0 #163239

huydhn · 2025-09-18T03:43:12Z

Testing now that vllm-project/vllm#24599 has been merged

Signed-off-by: Huy Do <[email protected]>

pytorch-bot · 2025-09-18T03:43:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163239

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit fc0614d with merge base 4b7aed8 ():

NEW FAILURE - The following job has failed:

Build vLLM wheels / Build cu130 vLLM wheel on manylinux_2_28_aarch64 (gh)
ninja: build stopped: subcommand failed

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Aidyn-A · 2025-09-18T07:42:42Z

Hmm... These segmentation faults are annoying:

2025-09-18T03:56:20.8919447Z #21 260.0 sh: line 1:  1716 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000199_00000000-6_flash_fwd_hdim128_bf16_sm100.compute_90a.ptx" -o "/tmp/tmpxft_00000199_00000000-11_flash_fwd_hdim128_bf16_sm100.compute_90a.cubin" > /tmp/tmpxft_00000199_00000000-13_189d18d0_stdout 2> /tmp/tmpxft_00000199_00000000-13_189d18d0_stderr
...
2025-09-18T03:56:26.6301630Z #21 265.7 sh: line 1:  1766 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_0000019c_00000000-6_flash_fwd_hdim128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_0000019c_00000000-11_flash_fwd_hdim128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_0000019c_00000000-13_403da8f0_stdout 2> /tmp/tmpxft_0000019c_00000000-13_403da8f0_stderr
...
2025-09-18T03:56:35.8654592Z #21 275.0 sh: line 1:  1813 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_000001b1_00000000-6_flash_fwd_hdim128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_000001b1_00000000-11_flash_fwd_hdim128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stdout 2> /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stderr
...
2025-09-18T03:58:44.1437362Z #21 403.4 sh: line 1:  2262 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000600_00000000-6_flash_fwd_hdim192_128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000600_00000000-11_flash_fwd_hdim192_128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000600_00000000-13_344c4aa0_stdout 2> /tmp/tmpxft_00000600_00000000-13_344c4aa0_stderr
...
2025-09-18T03:58:53.9488721Z #21 413.2 sh: line 1:  2280 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000679_00000000-6_flash_fwd_hdim192_128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000679_00000000-11_flash_fwd_hdim192_128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000679_00000000-13_1bcf8520_stdout 2> /tmp/tmpxft_00000679_00000000-13_1bcf8520_stderr
...
2025-09-18T03:59:43.9530305Z #21 463.1 sh: line 1:  2325 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000769_00000000-6_flash_fwd_hdim192_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000769_00000000-11_flash_fwd_hdim192_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000769_00000000-13_2d628f0_stdout 2> /tmp/tmpxft_00000769_00000000-13_2d628f0_stderr

One noticeable fact is that they are all failing on sm_90a.

huydhn · 2025-09-18T08:04:28Z

Yeah, they are coming from compiling xformers https://github.com/facebookresearch/xformers/releases/tag/v0.0.32.post2 on aarch64. I don't know that the issue is about yet, so appreciate any thoughts you have in mind

Aidyn-A · 2025-09-18T09:27:24Z

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

Signed-off-by: Huy Do <[email protected]>

huydhn · 2025-09-18T17:26:49Z

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

~~Ohh, you're spot on, it works after I lower MAX_JOBS~~ I spoke too soon, CI hasn't been run yet because of the merge conflicts, thus the green CI signals >_<

Signed-off-by: Huy Do <[email protected]>

huydhn · 2025-09-20T00:08:40Z

This is currently blocked by a segfault on ptxas -arch=sm_90a that @Aidyn-A discovered. We have seen this only on aarch64, but x86 might be affected too. Maybe I could try my luck and skip aarch64 build for now

huydhn · 2025-09-23T07:21:52Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-09-23T07:23:35Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-09-23T07:23:36Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/main pull/163239/head returned non-zero exit code 1

Rebasing (1/2)
Auto-merging .github/ci_commit_pins/vllm.txt
CONFLICT (content): Merge conflict in .github/ci_commit_pins/vllm.txt
error: could not apply 82df8a8a0ee... Build vLLM nightly wheels for CUDA 13.0
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 82df8a8a0ee... # Build vLLM nightly wheels for CUDA 13.0

Raised by https://github.com/pytorch/pytorch/actions/runs/17938711036

ptrblck · 2025-09-24T15:30:13Z

Yeah, they are coming from compiling xformers...

@huydhn do we know if flash-attn is also built as part of xformers? If so, this fix might be needed: https://github.com/Dao-AILab/flash-attention/pull/1860/files

johnnynunez · 2025-09-26T18:54:29Z

fixed: facebookresearch/xformers#1337
cc @Aidyn-A

huydhn · 2025-09-26T20:07:25Z

Thank @johnnynunez for the fix! And yes, xformers builds flash-attn

johnnynunez · 2025-10-03T00:43:33Z

@ptrblck @huydhn all PRs necessary for vllm cuda 13, were merged in public vllm(including flash-attention and blackwell family + cutlass v4.2.1), now only missing is facebookresearch/xformers#1337 I think that it is not merged yet because i was poiting to 2.9.0 and cuda 13.0 failing tests because not exists yet

Build vLLM nightly wheels for CUDA 13.0

82df8a8

Signed-off-by: Huy Do <[email protected]>

huydhn added the test-config/default label Sep 18, 2025

pytorch-bot bot added ciflow/inductor topic: not user facing topic category labels Sep 18, 2025

Try to lower max_jobs

1c13887

Signed-off-by: Huy Do <[email protected]>

huydhn added the ciflow/vllm label Sep 18, 2025

Merge branch 'main' into vllm-wheel-cuda13

fc0614d

Signed-off-by: Huy Do <[email protected]>

Build vLLM nightly wheels for CUDA 13.0 #163239

Are you sure you want to change the base?

Build vLLM nightly wheels for CUDA 13.0 #163239

Conversation

huydhn commented Sep 18, 2025

Uh oh!

pytorch-bot bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163239

❌ 1 New Failure

Uh oh!

Aidyn-A commented Sep 18, 2025

Uh oh!

huydhn commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aidyn-A commented Sep 18, 2025

Uh oh!

huydhn commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Sep 20, 2025

Uh oh!

huydhn commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

ptrblck commented Sep 24, 2025

Uh oh!

johnnynunez commented Sep 26, 2025

Uh oh!

huydhn commented Sep 26, 2025

Uh oh!

johnnynunez commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Sep 18, 2025 •

edited

Loading

huydhn commented Sep 18, 2025 •

edited

Loading

huydhn commented Sep 18, 2025 •

edited

Loading

johnnynunez commented Oct 3, 2025 •

edited

Loading