Skip to content

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Sep 18, 2025

Testing now that vllm-project/vllm#24599 has been merged

Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163239

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit fc0614d with merge base 4b7aed8 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Sep 18, 2025

Hmm... These segmentation faults are annoying:

2025-09-18T03:56:20.8919447Z #21 260.0 sh: line 1:  1716 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000199_00000000-6_flash_fwd_hdim128_bf16_sm100.compute_90a.ptx" -o "/tmp/tmpxft_00000199_00000000-11_flash_fwd_hdim128_bf16_sm100.compute_90a.cubin" > /tmp/tmpxft_00000199_00000000-13_189d18d0_stdout 2> /tmp/tmpxft_00000199_00000000-13_189d18d0_stderr
...
2025-09-18T03:56:26.6301630Z #21 265.7 sh: line 1:  1766 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_0000019c_00000000-6_flash_fwd_hdim128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_0000019c_00000000-11_flash_fwd_hdim128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_0000019c_00000000-13_403da8f0_stdout 2> /tmp/tmpxft_0000019c_00000000-13_403da8f0_stderr
...
2025-09-18T03:56:35.8654592Z #21 275.0 sh: line 1:  1813 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_000001b1_00000000-6_flash_fwd_hdim128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_000001b1_00000000-11_flash_fwd_hdim128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stdout 2> /tmp/tmpxft_000001b1_00000000-13_3bfe3ad0_stderr
...
2025-09-18T03:58:44.1437362Z #21 403.4 sh: line 1:  2262 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000600_00000000-6_flash_fwd_hdim192_128_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000600_00000000-11_flash_fwd_hdim192_128_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000600_00000000-13_344c4aa0_stdout 2> /tmp/tmpxft_00000600_00000000-13_344c4aa0_stderr
...
2025-09-18T03:58:53.9488721Z #21 413.2 sh: line 1:  2280 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000679_00000000-6_flash_fwd_hdim192_128_fp16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000679_00000000-11_flash_fwd_hdim192_128_fp16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000679_00000000-13_1bcf8520_stdout 2> /tmp/tmpxft_00000679_00000000-13_1bcf8520_stderr
...
2025-09-18T03:59:43.9530305Z #21 463.1 sh: line 1:  2325 Segmentation fault      (core dumped) ptxas -arch=sm_90a -m64 -v --generate-line-info "/tmp/tmpxft_00000769_00000000-6_flash_fwd_hdim192_bf16_sm90.compute_90a.ptx" -o "/tmp/tmpxft_00000769_00000000-11_flash_fwd_hdim192_bf16_sm90.compute_90a.cubin" > /tmp/tmpxft_00000769_00000000-13_2d628f0_stdout 2> /tmp/tmpxft_00000769_00000000-13_2d628f0_stderr

One noticeable fact is that they are all failing on sm_90a.

@huydhn
Copy link
Contributor Author

huydhn commented Sep 18, 2025

Yeah, they are coming from compiling xformers https://github.com/facebookresearch/xformers/releases/tag/v0.0.32.post2 on aarch64. I don't know that the issue is about yet, so appreciate any thoughts you have in mind

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Sep 18, 2025

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

@huydhn
Copy link
Contributor Author

huydhn commented Sep 18, 2025

I have not encountered segfaults like that, but my first action would be decreasing MAX_JOBS because those CUTLASS kernels are extremely compile-hungry.

Ohh, you're spot on, it works after I lower MAX_JOBS I spoke too soon, CI hasn't been run yet because of the merge conflicts, thus the green CI signals >_<

@huydhn
Copy link
Contributor Author

huydhn commented Sep 20, 2025

This is currently blocked by a segfault on ptxas -arch=sm_90a that @Aidyn-A discovered. We have seen this only on aarch64, but x86 might be affected too. Maybe I could try my luck and skip aarch64 build for now

@huydhn
Copy link
Contributor Author

huydhn commented Sep 23, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/main pull/163239/head returned non-zero exit code 1

Rebasing (1/2)
Auto-merging .github/ci_commit_pins/vllm.txt
CONFLICT (content): Merge conflict in .github/ci_commit_pins/vllm.txt
error: could not apply 82df8a8a0ee... Build vLLM nightly wheels for CUDA 13.0
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 82df8a8a0ee... # Build vLLM nightly wheels for CUDA 13.0

Raised by https://github.com/pytorch/pytorch/actions/runs/17938711036

@ptrblck
Copy link
Collaborator

ptrblck commented Sep 24, 2025

Yeah, they are coming from compiling xformers...

@huydhn do we know if flash-attn is also built as part of xformers? If so, this fix might be needed: https://github.com/Dao-AILab/flash-attention/pull/1860/files

@johnnynunez
Copy link
Contributor

fixed: facebookresearch/xformers#1337
cc @Aidyn-A

@huydhn
Copy link
Contributor Author

huydhn commented Sep 26, 2025

Thank @johnnynunez for the fix! And yes, xformers builds flash-attn

@johnnynunez
Copy link
Contributor

johnnynunez commented Oct 3, 2025

@ptrblck @huydhn all PRs necessary for vllm cuda 13, were merged in public vllm(including flash-attention and blackwell family + cutlass v4.2.1), now only missing is facebookresearch/xformers#1337 I think that it is not merged yet because i was poiting to 2.9.0 and cuda 13.0 failing tests because not exists yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants