Tom-CaoZH

Follow

👋

Focusing

Zhang Cao Tom-CaoZH

👋

Focusing

Follow

60 followers · 123 following

China
https://tom-caozh.github.io/

Achievements

Achievements

Highlights

Pro

Lists (5)

Sort

CXL

12 repositories

data_structures

eBPF

kv-stores

RDMA

Starred repositories

cache-ext / cache_ext

cache_ext is a framework to customize Linux page cache eviction policies using BPF. Appeared in SOSP 2025.

Jupyter Notebook 23 6 Updated Oct 28, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 532 47 Updated Oct 27, 2025

pettingllms-ai / PettingLLMs

A RL Framework for multi LLM agent system

Python 38 4 Updated Oct 21, 2025

flashinfer-ai / flashinfer-bench

Building the Virtuous Cycle for AI-driven LLM Systems

Python 74 12 Updated Oct 28, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 18,683 1,250 Updated Oct 25, 2025

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,150 133 Updated Oct 29, 2025

infinigence / HamiltonAttention

Python 30 2 Updated Oct 15, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 34,485 3,854 Updated Oct 29, 2025

mem0ai / mem0

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 42,173 4,538 Updated Oct 30, 2025

alibaba / clusterdata

cluster data collected from production clusters in Alibaba for cluster management research

Jupyter Notebook 1,868 438 Updated Oct 13, 2025

pie-project / pie

Pie: Programmable LLM Serving

Python 50 10 Updated Oct 30, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,325 95 Updated Oct 30, 2025

microsoft / apex_plus

APEX+ is an LLM Serving Simulator

Python 37 6 Updated Jun 16, 2025

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 903 87 Updated Sep 10, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,310 233 Updated Oct 30, 2025

open-neutrino / neutrino

C 200 17 Updated Aug 4, 2025

XunhaoLai / native-sparse-attention-triton

Efficient triton implementation of Native Sparse Attention.

Python 241 17 Updated May 23, 2025

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

MoonshotAI / checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 795 59 Updated Oct 30, 2025

thinking-machines-lab / batch_invariant_ops

Python 871 63 Updated Oct 14, 2025

Noumena-Network / NSA-Test

NSA Triton Kernels written with GPT5 and Opus 4.1

Python 64 5 Updated Aug 12, 2025

ByteDance-Seed / cudaLLM

Python 120 6 Updated Aug 18, 2025

zhuzilin / flash-attention-with-sink

Python 39 1 Updated Aug 7, 2025

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,684 282 Updated Oct 30, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,021 1,890 Updated Oct 23, 2025

openai / harmony

Renderer for the harmony response format to be used with gpt-oss

Rust 3,954 222 Updated Aug 15, 2025

tilde-research / nsa-impl

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 123 5 Updated Jun 24, 2025

stepfun-ai / StepMesh

C++ 309 26 Updated Oct 1, 2025

stepfun-ai / Step3

429 9 Updated Aug 10, 2025

yichuan-w / LEANN

RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Python 3,327 340 Updated Oct 29, 2025

Starred topics

Awesome Lists