Stars
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Optimized primitives for collective multi-GPU communication
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
This is a repo for a mini distributed warehouse management system based on gRPC
LITE Kernel RDMA Support for Datacenter Applications. SOSP 2017.
Fastsocket is a highly scalable socket and its underlying networking implementation of Linux kernel. With the straight linear scalability, Fastsocket can provide extremely good performance in multi…
mimalloc is a compact general purpose allocator with excellent performance.
Pond: CXL-Based Memory Pooling Systems for Cloud Platforms (ASPLOS'23)
The Artifact Evaluation Version of SOSP Paper #19
a library version of FreeBSD's TCP/IP stack plus extras
AIFM: High-Performance, Application-Integrated Far Memory
Tools for profiling the Linux network stack.
libunwind official github repo (in need of new / additional maintainer, mail/open issue if interested)
Fork from official iperf-3.1.3, and run on the dpdk user space TCP/IP stack(ANS).
iperf3: A TCP, UDP, and SCTP network bandwidth measurement tool
nettrace is a eBPF-based tool to trace network packet and diagnose network problem.
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more