-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Pull requests: jax-ml/jax
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Pallas:MGPU] Use a much better matmul kernel in the collective matmul
#32010
by copybara-service
bot
was merged Sep 24, 2025
Loading…
updated Sep 24, 2025
[Pallas:TPU] Use the lowering backend to query the libTPU version
#31591
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Pallas:MGPU] Add a new epilogue to the Hopper matmul kernel
#31692
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Mosaic GPU] Cast i4->bf16 through i32
#32009
by copybara-service
bot
was merged Sep 22, 2025
Loading…
updated Sep 22, 2025
[Mosaic GPU] Add a dedicated custom call for preallocating outputs
#31116
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Debugging] platform init failures
#31506
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Pallas:MGPU] Add more API links in the reference guide
documentation
#29579
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Mosaic GPU] Add support for WGMMA/TCGEN05_TRANSPOSED_LAYOUT layout casts for 32-bit dtypes
#31957
by copybara-service
bot
was merged Sep 22, 2025
Loading…
updated Sep 22, 2025
[Pallas:MGPU] Add double buffering in Blackwell matmul epilogue
#31962
by copybara-service
bot
was closed Sep 22, 2025
Loading…
updated Sep 22, 2025
[Mosaic GPU] Optimize int4->int8 casts following relayouts
#31950
by copybara-service
bot
was merged Sep 19, 2025
Loading…
updated Sep 19, 2025
[Mosaic GPU] Add an optimized conversion routine for int4 to int8 casts
#31949
by copybara-service
bot
was merged Sep 19, 2025
Loading…
updated Sep 19, 2025
[Pallas:MGPU] Add support for clusters and TMA multicast in the Hopper matmul
#31782
by copybara-service
bot
was merged Sep 19, 2025
Loading…
updated Sep 19, 2025
[Pallas:MGPU] Add support for collective copies in pipelining helpers
#31781
by copybara-service
bot
was merged Sep 19, 2025
Loading…
updated Sep 19, 2025
Strengthen the tests for type conversions
#31948
by copybara-service
bot
was merged Sep 19, 2025
Loading…
updated Sep 19, 2025
[Pallas:MGPU] Allow changing the number of arrivals on cluster barriers
#31780
by copybara-service
bot
was merged Sep 18, 2025
Loading…
updated Sep 18, 2025
[Mosaic GPU] Allow varying the arrival count on cluster barriers
#31779
by copybara-service
bot
was merged Sep 18, 2025
Loading…
updated Sep 18, 2025
[Pallas:MGPU] Add tests for the newly added tuning options
#31776
by copybara-service
bot
was merged Sep 18, 2025
Loading…
updated Sep 18, 2025
[Pallas:MGPU] Use the planar_snake grid iteration order to improve L2 hit rate in the Blackwell matmul
#31773
by copybara-service
bot
was merged Sep 12, 2025
Loading…
updated Sep 12, 2025
[Pallas:MGPU] Allow tuning the non-contracting dimension split over compute WGs in Hopper matmul
#31746
by copybara-service
bot
was merged Sep 12, 2025
Loading…
updated Sep 12, 2025
[Pallas:MGPU] Avoid bubbles between steps of the MN loop in the Hopper matmul
#31737
by copybara-service
bot
was merged Sep 12, 2025
Loading…
updated Sep 12, 2025
[Pallas:MGPU] Add support for tuning epilogue_tile_n in Blackwell matmul
#31749
by copybara-service
bot
was merged Sep 11, 2025
Loading…
updated Sep 11, 2025
[Pallas:MGPU] Use a single barrier to wait for A and B transfers in Blackwell matmul
#31748
by copybara-service
bot
was merged Sep 11, 2025
Loading…
updated Sep 11, 2025
[Pallas:MGPU] Don't use delay_release in Hopper MGPU matmul example
#31736
by copybara-service
bot
was merged Sep 11, 2025
Loading…
updated Sep 11, 2025
[Pallas:MGPU] Use the planar_snake CTA order to improve L2 cache hits in Hopper matmul
#31735
by copybara-service
bot
was merged Sep 11, 2025
Loading…
updated Sep 11, 2025
[Pallas:MGPU] Add support for tiling the TMA epilogue
#31699
by copybara-service
bot
was merged Sep 11, 2025
Loading…
updated Sep 11, 2025
Previous Next
ProTip!
Follow long discussions with comments:>50.