[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,733 271 Updated Nov 28, 2025

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 532 72 Updated Nov 7, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,352 109 Updated Nov 28, 2025

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 164 34 Updated Nov 27, 2025

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 500 112 Updated Nov 27, 2025

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

9,610 682 Updated Nov 7, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 9,338 1,150 Updated Nov 3, 2025

rednote-hilab / dots.llm1

The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.

468 23 Updated Aug 20, 2025

NVlabs / Fast-dLLM

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 710 59 Updated Nov 28, 2025

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 1,048 173 Updated Nov 28, 2025

facebookresearch / LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python 347 33 Updated May 3, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,760 324 Updated Nov 28, 2025

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 507 36 Updated Feb 10, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,812 2,676 Updated Nov 27, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,474 820 Updated Nov 9, 2025

HArmonizedSS / HASS

Forked from SafeAILab/EAGLE

Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)

Python 51 7 Updated Mar 14, 2025

Natural language processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUO-QING JIANG Ageliss

Achievements

Achievements

Block or report Ageliss

Starred repositories

inclusionAI / AReaL

THUDM / slime

FibonaccciYan / Adamas

inclusionAI / Ling-V2

inclusionAI / linghe

deepseek-ai / DeepSeek-OCR

rednote-hilab / dots.ocr

thu-ml / SageAttention

perplexityai / pplx-kernels

ByteDance-Seed / VeOmni

ISEEKYAN / mbridge

sgl-project / SpecForge

MoonshotAI / Kimi-K2

GeeeekExplorer / nano-vllm

rednote-hilab / dots.llm1

NVlabs / Fast-dLLM

NVIDIA-NeMo / RL

facebookresearch / LayerSkip

xlite-dev / Awesome-LLM-Inference

mit-han-lab / duo-attention

volcengine / verl

OpenRLHF / OpenRLHF

HArmonizedSS / HASS

FoundationAgents / OpenManus

LMCache / LMCache

HazyResearch / ThunderKittens

deepseek-ai / DeepGEMM

deepseek-ai / FlashMLA

ROCm / composable_kernel

pytorch / FBGEMM

Starred topics

Natural language processing