Lists (11)
Sort Name ascending (A-Z)
Starred repositories
WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.
AirLLM 70B inference with single 4GB GPU
A throughput-oriented high-performance serving framework for LLMs
[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
A CLI to estimate inference memory requirements for Hugging Face models, written in Python.
CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…
A PyTorch native platform for training generative AI models
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
SGLang is a high-performance serving framework for large language models and multimodal models.
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
A low-latency & high-throughput serving engine for LLMs
A framework for few-shot evaluation of language models.
🚀 Efficient implementations of state-of-the-art linear attention models
A compilation of the best multi-agent papers
Recipe for a General, Powerful, Scalable Graph Transformer
Chimera: State Space Models Beyond Sequences
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
code for the paper "Heta: Distributed Training of Heterogeneous Graph Neural Networks"