Skip to content
View BurkeHulk's full-sized avatar

Block or report BurkeHulk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,487 984 Updated Feb 6, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,984 1,098 Updated Feb 9, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,175 819 Updated Feb 3, 2026

PyTorch native quantization and sparsity for training and inference

Python 2,687 429 Updated Feb 14, 2026

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,180 569 Updated Aug 22, 2025

A library for unit scaling in PyTorch

Jupyter Notebook 133 12 Updated Jul 11, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,953 822 Updated Feb 15, 2026