Skip to content
View Tom-CaoZH's full-sized avatar
👋
Focusing
👋
Focusing

Highlights

  • Pro

Block or report Tom-CaoZH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

cache_ext is a framework to customize Linux page cache eviction policies using BPF. Appeared in SOSP 2025.

Jupyter Notebook 23 6 Updated Oct 28, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 532 47 Updated Oct 27, 2025

A RL Framework for multi LLM agent system

Python 38 4 Updated Oct 21, 2025

Building the Virtuous Cycle for AI-driven LLM Systems

Python 74 12 Updated Oct 28, 2025

Contexts Optical Compression

Python 18,683 1,250 Updated Oct 25, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,150 133 Updated Oct 29, 2025

The best ChatGPT that $100 can buy.

Python 34,485 3,854 Updated Oct 29, 2025

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 42,173 4,538 Updated Oct 30, 2025

cluster data collected from production clusters in Alibaba for cluster management research

Jupyter Notebook 1,868 438 Updated Oct 13, 2025

Pie: Programmable LLM Serving

Python 50 10 Updated Oct 30, 2025

Post-training with Tinker

Python 1,325 95 Updated Oct 30, 2025

APEX+ is an LLM Serving Simulator

Python 37 6 Updated Jun 16, 2025

Ring attention implementation with flash attention

Python 903 87 Updated Sep 10, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,310 233 Updated Oct 30, 2025

Efficient triton implementation of Native Sparse Attention.

Python 241 17 Updated May 23, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 795 59 Updated Oct 30, 2025

NSA Triton Kernels written with GPT5 and Opus 4.1

Python 64 5 Updated Aug 12, 2025
Python 120 6 Updated Aug 18, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,684 282 Updated Oct 30, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,021 1,890 Updated Oct 23, 2025

Renderer for the harmony response format to be used with gpt-oss

Rust 3,954 222 Updated Aug 15, 2025

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 123 5 Updated Jun 24, 2025
C++ 309 26 Updated Oct 1, 2025
429 9 Updated Aug 10, 2025

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 3,327 340 Updated Oct 29, 2025
Next