Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
Large Language Model Text Generation Inference
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Fast and memory-efficient exact attention
A framework for few-shot evaluation of language models.
🌐 Jekyll is a blog-aware static site generator in Ruby
A lightweight data processing framework built on DuckDB and 3FS.
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Constrained Decoding of Diffusion LLMs with Context-Free Grammars.
Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
A curated list of papers related to constrained decoding of LLM, along with their relevant code and resources.
The most open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).
Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
A collection of papers on discrete diffusion models
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"
Official PyTorch implementation for "Large Language Diffusion Models"
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
A library for efficient similarity search and clustering of dense vectors.