Skip to content

Conversation

johnnynunez
Copy link

@johnnynunez johnnynunez commented Sep 26, 2025

What does this PR do?

Fixes #1320 #1308 #1323 and includes fixes for flash-attention >= CUDA 12.9 and adds cutlass v4.2.1 that fixes some kernels for Blackwell.
Also add support for Spark and Thor.
Added Blackwell family support. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Thanks to: #1285 #1262 that are included here.

Fixes in flash-attention to support CUDA 13:

  1. CUTLASS v4.2.1 Upgrade to cutlass v4.2.1 Dao-AILab/flash-attention#1905
  2. C++11 fix warnings C++11 fix warnings Dao-AILab/flash-attention#1904
  3. Blackwell family specific [NVIDIA] Enable Blackwell Family Specific Dao-AILab/flash-attention#1882
  4. [BUILD] SBSA wheels + CUDA 13 Support [BUILD] SBSA wheels + CUDA 13 Support Dao-AILab/flash-attention#1865
  5. [BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds [BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds Dao-AILab/flash-attention#1860

Pytorch 2.9.0 https://dev-discuss.pytorch.org/t/pytorch-2-9-rc1-produced-for-pytorch-audio-vision/3234

cc @sgrigory

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2025
@johnnynunez johnnynunez marked this pull request as draft September 26, 2025 10:27
@johnnynunez johnnynunez marked this pull request as ready for review September 26, 2025 10:46
@johnnynunez johnnynunez marked this pull request as draft September 26, 2025 17:22
@johnnynunez johnnynunez marked this pull request as ready for review September 26, 2025 18:41
@johnnynunez
Copy link
Author

fixed also: #1335

@johnnynunez
Copy link
Author

johnnynunez commented Sep 26, 2025

cc @sgrigory @takuma104 @bottler ready to merge
image

@johnnynunez
Copy link
Author

this is ready for pytorch 2.9. Feel free to change builds now for pytorch 2.8.0

@johnnynunez johnnynunez closed this Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cu129 ERROR

1 participant