-
Tsinghua University
- Beijing, China
Highlights
- Pro
Starred repositories
Supercharge Your LLM with the Fastest KV Cache Layer
SGLang is a fast serving framework for large language models and vision language models.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Virtuoso is a fast, accurate and versatile simulation framework designed for virtual memory research. Virtuoso uses a new simulation methodology for estimating OS overheads and models diverse VM de…
Calculating the actual value of your job beyond just salary
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashMLA: Efficient Multi-head Latent Attention Kernels
A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
A curated list of awesome smartnic tutorials, papers and projects.
A rust-based benchmark for BlueField SmartNICs.
A collection of awesome researchers and papers about disaggregated memory.
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…
My Design Philosophy Summary (Most of them are in Chinese)
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
A high-throughput and memory-efficient inference and serving engine for LLMs