Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Starred repositories
cache_ext is a framework to customize Linux page cache eviction policies using BPF. Appeared in SOSP 2025.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A RL Framework for multi LLM agent system
Building the Virtuous Cycle for AI-driven LLM Systems
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.
cluster data collected from production clusters in Alibaba for cluster management research
Post-training with Tinker
Ring attention implementation with flash attention
slime is an LLM post-training framework for RL Scaling.
Efficient triton implementation of Native Sparse Attention.
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
NSA Triton Kernels written with GPT5 and Opus 4.1
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
An efficient implementation of the NSA (Native Sparse Attention) kernel
RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.