-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Open
Labels
enhancementNew feature or requestNew feature or requestpallasIssues pertaining to Pallas (GPU or TPU)Issues pertaining to Pallas (GPU or TPU)
Description
In Jax experimental pallas kernels for TPU , there is support for attn logits softcapping for paged attention but not for flash attention.
If support can be added for pallas flash kernels as well, as it can then be used in pytorch xla as well as vllm implementation.
Gemma 2 9b model works even with logit softcapping but 27 b doesn't.
PR for support of soft capping for Paged Attention
Pytorch xla custom kernel integration for paged attention
Need for flash attention support for running Gemma 2 with VLLM on TPUs
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestpallasIssues pertaining to Pallas (GPU or TPU)Issues pertaining to Pallas (GPU or TPU)