Stars
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
基于LLaMA2-7B增量预训练的藏文大语言模型TiLamb(Tibetan Large Language Model Base)
PyTorch native quantization and sparsity for training and inference
A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
My learning notes for ML SYS.
FlashMLA: Efficient Multi-head Latent Attention Kernels
Implement Flash Attention using Cute.
A PyTorch native platform for training generative AI models
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
《Machine Learning Systems: Design and Implementation》- Chinese Version
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Excalidraw-CN 是支持中文手写和多画布的 Excalidraw 白板工具。Excalidraw-CN is a whiteboard supporting Chinese hand draw font and multi-canvas based on Excalidraw.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
SGLang is a high-performance serving framework for large language models and multimodal models.