[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,060 316 Updated Jan 17, 2026

NLP-Learning / TiLamb

基于LLaMA2-7B增量预训练的藏文大语言模型TiLamb（Tibetan Large Language Model Base）

34 5 Updated Apr 3, 2024

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,630 404 Updated Jan 17, 2026

openai / grade-school-math

Python 1,383 186 Updated Jan 21, 2024

MLSysU / TD-Pipe

A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models

Python 47 2 Updated Dec 24, 2025

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 635 136 Updated Jan 18, 2026

knemik97 / Manifesto-against-the-Plagiarist-Yunhe-Wang

讨贼王云鹤檄文

1,102 113 Updated Jul 8, 2025

jingyaogong / minimind

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 37,546 4,483 Updated Jan 18, 2026

endurehero / FlashMLA

Forked from deepseek-ai/FlashMLA

C++ 6 1 Updated Mar 11, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 5,072 329 Updated Jan 16, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,981 936 Updated Jan 16, 2026

luliyucoordinate / cute-flash-attention

Implement Flash Attention using Cute.

Cuda 100 8 Updated Dec 17, 2024

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,974 670 Updated Jan 18, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,396 926 Updated Jan 18, 2026

wuye9036 / CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,505 1,621 Updated Aug 20, 2024

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,750 272 Updated Jul 18, 2025

openmlsys / openmlsys-zh

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,743 476 Updated Apr 13, 2024

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

73,373 8,424 Updated Dec 22, 2025

korbinzhao / excalidraw-cn

Excalidraw-CN 是支持中文手写和多画布的 Excalidraw 白板工具。Excalidraw-CN is a whiteboard supporting Chinese hand draw font and multi-canvas based on Excalidraw.

TypeScript 2,328 294 Updated Jan 16, 2024

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 11,930 1,200 Updated Apr 30, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 986 83 Updated Sep 4, 2024

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 155,362 31,776 Updated Jan 18, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,528 4,085 Updated Jan 19, 2026

l0ngc / hpc-learning

hpc-learning

774 47 Updated May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pengcuo

Block or report pengcuo

Stars

pengcuo / FlashMLA-ETAP

NVIDIA / cutlass

nvidia-cosmos / cosmos-rl

ByteDance-Seed / VeOmni

Tencent-Hunyuan / Hunyuan-MT

hiyouga / LlamaFactory

thu-ml / SageAttention