-
University of Illinois Urbana-Champaign
- in/gangmuk-lim-258a76137
- https://gangmuk.github.io
Stars
LLMRouter: An Open-Source Library for LLM Routing
The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.
My learning notes for ML SYS.
A light weight vLLM simulator, for mocking out replicas.
LLM serving cluster simulator
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
Distributed Compiler based on Triton for Parallel Systems
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Easy design, testing, and deployment of optical data center networks for everyone.
Asterinas is a secure, fast, and general-purpose OS kernel, written in Rust and providing Linux-compatible ABI.
A Pythonic framework to simplify AI service building
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
TensorFlow-based neural network library
⚡️Open platform for intelligent modules powered by agents that share capabilities with each other. Each module brings deep domain expertise to solve complex problems together.
Gateway API Benchmarks provides a common set of tests to evaluate a Gateway API implementation.
NGINX Lua plugin for adaptive concurrency control used to handle overload in services
This repo is an unofficial go-gRPC implementation of DAGOR, the wechat microservice overload control. It's a part of efforts to compare with our design: Rajomon.
pennsail / breakwater-grpc
Forked from lohpaul9/breakwater-grpcThis is a forked repo of the unofficial implementation of Breakwater by Paul Loh. You can find the official repo below:
Rajomon: Decentralized and Coordinated Overload Control for Latency-Sensitive Microservices
An unofficial, go-gRPC implementation of TopFull RL-based rate limiting. You can find the official repo below:
Interactive visualizations of the geometric intuition behind diffusion models.
Efficient and easy multi-instance LLM serving
KV cache store for distributed LLM inference
A Datacenter Scale Distributed Inference Serving Framework
A large-scale simulation framework for LLM inference