merrymercy

Lianmin Zheng merrymercy

dev efficiency engineer

3.7k followers · 169 following

xAI
Bay Arena, CA
http://lmzheng.net
@lm_zheng

Achievements

x4 x4 x4

Achievements

x4 x4 x4

Highlights

Organizations

Stars

radixark / miles

Python 317 23 Updated Nov 28, 2025

sgl-project / sglang-jax

JAX backend for SGL

Python 185 36 Updated Nov 27, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,055 76 Updated Nov 18, 2025

zai-org / GLM-4.5

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Python 3,250 333 Updated Oct 11, 2025

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

9,609 682 Updated Nov 7, 2025

sgl-project / ome

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go 322 46 Updated Nov 28, 2025

sgl-project / genai-bench

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 234 35 Updated Nov 25, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,602 289 Updated Nov 28, 2025

ChenmienTan / RL2

Python 918 96 Updated Nov 22, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,331 445 Updated Nov 27, 2025

inclusionAI / AReaL

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,090 238 Updated Nov 28, 2025

PolyhedraZK / Expander

Expander, an open-source GKR prover designed for scaling large-scale parallel computing.

Rust 136 53 Updated Sep 18, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 63,196 7,642 Updated Nov 27, 2025

lm-sys / lm-sys.github.io

The source of LMSYS website and blogs

JavaScript 70 56 Updated Nov 26, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,767 2,667 Updated Nov 27, 2025

deepseek-ai / DeepSeek-V3

Python 100,414 16,368 Updated Aug 28, 2025

dropbox / gemlite

Fast low-bit matmul kernels in Triton

Python 401 29 Updated Nov 21, 2025

Tencent-Hunyuan / HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,339 1,138 Updated Nov 21, 2025

OpenBMB / MiniCPM

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,436 522 Updated Oct 8, 2025

nunchaku-tech / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,390 201 Updated Nov 17, 2025

nunchaku-tech / deepcompressor

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 700 69 Updated Aug 14, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,395 103 Updated Nov 27, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,289 259 Updated Nov 25, 2025

svcaf / howto501c3

如何在美国加州建立501c3非盈利组织的文档

14 2 Updated Sep 12, 2021

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,474 820 Updated Nov 9, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,731 272 Updated Nov 6, 2025