Skip to content
View gangmuk's full-sized avatar

Block or report gangmuk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLMRouter: An Open-Source Library for LLM Routing

Python 964 81 Updated Jan 6, 2026

The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.

DIGITAL Command Language 86 40 Updated Jan 18, 2025

My learning notes for ML SYS.

Python 4,948 320 Updated Jan 7, 2026

A light weight vLLM simulator, for mocking out replicas.

Go 76 53 Updated Jan 1, 2026

LLM serving cluster simulator

Jupyter Notebook 130 13 Updated Apr 25, 2024

Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

Go 136 33 Updated Jan 7, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,307 117 Updated Dec 27, 2025

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 144 29 Updated Dec 30, 2025

Easy design, testing, and deployment of optical data center networks for everyone.

C++ 67 8 Updated Dec 17, 2025

Asterinas is a secure, fast, and general-purpose OS kernel, written in Rust and providing Linux-compatible ABI.

Rust 4,229 257 Updated Jan 7, 2026

A Pythonic framework to simplify AI service building

Python 2,803 193 Updated Jan 6, 2026

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Linear Programming 75 17 Updated Oct 15, 2025

Basically Heterogenous Inference

Python 115 4 Updated Oct 29, 2025

TensorFlow-based neural network library

Python 9,898 1,301 Updated Aug 4, 2025

⚡️Open platform for intelligent modules powered by agents that share capabilities with each other. Each module brings deep domain expertise to solve complex problems together.

Python 27 5 Updated Apr 18, 2025

Gateway API Benchmarks provides a common set of tests to evaluate a Gateway API implementation.

Go 525 29 Updated Dec 29, 2025

NGINX Lua plugin for adaptive concurrency control used to handle overload in services

Lua 14 2 Updated Dec 30, 2022

This repo is an unofficial go-gRPC implementation of DAGOR, the wechat microservice overload control. It's a part of efforts to compare with our design: Rajomon.

Go 3 1 Updated Mar 1, 2025

This is a forked repo of the unofficial implementation of Breakwater by Paul Loh. You can find the official repo below:

Go 1 1 Updated Mar 1, 2025

Rajomon: Decentralized and Coordinated Overload Control for Latency-Sensitive Microservices

Go 11 1 Updated May 19, 2025

An unofficial, go-gRPC implementation of TopFull RL-based rate limiting. You can find the official repo below:

Go 2 1 Updated Mar 1, 2025

Ultralytics YOLO 🚀

Python 50,844 9,813 Updated Jan 7, 2026

Interactive visualizations of the geometric intuition behind diffusion models.

JavaScript 925 44 Updated Jan 5, 2026

Efficient and easy multi-instance LLM serving

Python 520 44 Updated Sep 3, 2025

Non-fork of online boutique

Go 1 Updated Apr 28, 2024

KV cache store for distributed LLM inference

C++ 382 33 Updated Nov 13, 2025
Jupyter Notebook 81 3 Updated Nov 7, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,731 766 Updated Jan 7, 2026

A large-scale simulation framework for LLM inference

Python 512 97 Updated Jul 25, 2025
Next