Skip to content
View KarmaD7's full-sized avatar
💤
💤
  • Tsinghua University
  • Beijing, China

Highlights

  • Pro

Block or report KarmaD7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Succinct Data Structure Library 2.0

C++ 2,285 352 Updated Jun 2, 2023

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,934 696 Updated Nov 7, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 20,075 3,317 Updated Nov 10, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,245 421 Updated Nov 9, 2025

Virtuoso is a fast, accurate and versatile simulation framework designed for virtual memory research. Virtuoso uses a new simulation methodology for estimating OS overheads and models diverse VM de…

C++ 75 14 Updated Oct 15, 2025

Calculating the actual value of your job beyond just salary

TypeScript 2,805 170 Updated Oct 14, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,451 958 Updated Oct 24, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,874 305 Updated Mar 10, 2025

Expert Parallelism Load Balancer

Python 1,291 195 Updated Mar 24, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,870 739 Updated Oct 15, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,706 979 Updated Nov 6, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,855 897 Updated Sep 30, 2025
JavaScript 67 2 Updated Jun 30, 2025

Multicore in-memory storage engine

C++ 395 119 Updated Oct 10, 2017

一年过去了,你在华子食堂里花的钱都花在哪儿了?

Python 469 78 Updated Dec 23, 2024
Jupyter Notebook 6 Updated Dec 17, 2024

A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines

C++ 394 75 Updated Sep 18, 2025

A curated list of awesome smartnic tutorials, papers and projects.

282 37 Updated Oct 27, 2025

A rust-based benchmark for BlueField SmartNICs.

Rust 30 4 Updated Jul 5, 2023

A collection of awesome researchers and papers about disaggregated memory.

173 14 Updated Oct 14, 2025

Arbitrary offloads for RDMA NICs

C 98 21 Updated Apr 25, 2022

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…

C++ 17,361 4,068 Updated Nov 9, 2025

My Design Philosophy Summary (Most of them are in Chinese)

Python 491 108 Updated Nov 10, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,529 1,124 Updated Nov 10, 2025

example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory

C 145 36 Updated Jul 30, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,971 534 Updated Sep 25, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,614 11,139 Updated Nov 10, 2025
Next