Stars
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
PyTorch native quantization and sparsity for training and inference
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
A library for unit scaling in PyTorch
Accessible large language models via k-bit quantization for PyTorch.