-
University of California, Berkeley
- Berkeley, CA
- https://woosuk.me
- @woosuk_k
Highlights
- Pro
Stars
verl: Volcano Engine Reinforcement Learning for LLMs
TPU inference for vLLM, with unified JAX and PyTorch support.
SkyRL: A Modular Full-stack RL Library for LLMs
Post-training with Tinker
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
Open-source implementation of AlphaEvolve
Achieve state of the art inference performance with modern accelerators on Kubernetes
A Datacenter Scale Distributed Inference Serving Framework
ArcticInference: vLLM plugin for high-throughput, low-latency inference
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
Democratizing Reinforcement Learning for LLMs
[ACL 2025 Long Main] Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
NumPy aware dynamic Python compiler using LLVM
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
A collection of GPT system prompts and various prompt injection/leaking knowledge.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Fast, Flexible and Portable Structured Generation
Helpful tools and examples for working with flex-attention
Efficient Triton Kernels for LLM Training
A throughput-oriented high-performance serving framework for LLMs
ROCm / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch native quantization and sparsity for training and inference