Skip to content
View asuan99's full-sized avatar

Block or report asuan99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

WeDLM: The fastest diffusion language model with standard causal attention and native KV cache compatibility, delivering real speedups over vLLM-optimized baselines.

Python 550 37 Updated Jan 6, 2026

AirLLM 70B inference with single 4GB GPU

Jupyter Notebook 7,024 627 Updated Sep 3, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 936 46 Updated Oct 29, 2025

[ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models

Python 17 2 Updated Nov 4, 2025

ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.

Python 553 70 Updated Dec 23, 2025

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,501 835 Updated Jan 8, 2026

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

Python 259 19 Updated Jan 10, 2026

CUGA is an open-source generalist agent for the enterprise, supporting complex task execution on web and APIs, OpenAPI/MCP integrations, composable architecture, reasoning modes, and policy-aware f…

Python 637 98 Updated Jan 12, 2026

A PyTorch native platform for training generative AI models

Python 4,952 663 Updated Jan 12, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,898 317 Updated Jan 6, 2026

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

Python 469 62 Updated Jan 3, 2026

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,167 395 Updated Jul 11, 2024

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,317 4,022 Updated Jan 12, 2026

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 500 73 Updated Aug 1, 2024

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 802 56 Updated Mar 6, 2025

A low-latency & high-throughput serving engine for LLMs

Python 466 61 Updated Jan 8, 2026
Python 11 Updated Aug 16, 2025

A framework for few-shot evaluation of language models.

Python 11,158 2,955 Updated Jan 7, 2026
Python 204 15 Updated Dec 11, 2024
Python 65 4 Updated Sep 11, 2024

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,225 349 Updated Jan 12, 2026

A compilation of the best multi-agent papers

TeX 1,147 97 Updated Jan 8, 2026
Python 8 Updated May 28, 2024

Recipe for a General, Powerful, Scalable Graph Transformer

Python 819 146 Updated Jul 4, 2024

[Preprint] Graph State Space Convolution (GSSC)

Python 14 2 Updated Jun 11, 2024

Chimera: State Space Models Beyond Sequences

Jupyter Notebook 9 Updated Oct 15, 2025

Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"

Python 170 14 Updated Jan 30, 2025

code for the paper "Heta: Distributed Training of Heterogeneous Graph Neural Networks"

Python 2 Updated Jun 12, 2025
Python 49 18 Updated Apr 11, 2025
Next