asuan99

Lee-Wonho asuan99

22 followers · 61 following

in/wonho-lee825

Achievements

Lists (11)

Sort

Starred repositories

Tencent / WeDLM

WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.

Python 550 37 Updated Jan 6, 2026

lyogavin / airllm

AirLLM 70B inference with single 4GB GPU

Jupyter Notebook 7,024 627 Updated Sep 3, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 936 46 Updated Oct 29, 2025

GATECH-EIC / LaCache

[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

Python 17 2 Updated Nov 4, 2025

NVlabs / ToolOrchestra

ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.

Python 553 70 Updated Dec 23, 2025

OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,501 835 Updated Jan 8, 2026

alvarobartt / hf-mem

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

Python 259 19 Updated Jan 10, 2026

cuga-project / cuga-agent

CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…

Python 637 98 Updated Jan 12, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,952 663 Updated Jan 12, 2026

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,898 317 Updated Jan 6, 2026

bigcode-project / bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

Python 469 62 Updated Jan 3, 2026

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,167 395 Updated Jul 11, 2024

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,317 4,022 Updated Jan 12, 2026

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 500 73 Updated Aug 1, 2024

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 802 56 Updated Mar 6, 2025