Lists (5)
Sort Name ascending (A-Z)
Starred repositories
DeepEP: an efficient expert-parallel communication library
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
magic-trace collects and displays high-resolution traces of what a process is doing
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://hkuds.github.io/AI-Trader/
Production-Grade Container Scheduling and Management
Multi-agent framework, runtime and control plane. Built for speed, privacy, and scale.
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and improve overall performance.
GPUd automates monitoring, diagnostics, and issue identification for GPUs
This is the official implementation for **"AUTOPR: LET'S AUTOMATE YOUR ACADEMIC PROMOTION!**".
An transformer based LLM. Written completely in Rust
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
A workload for deploying LLM inference services on Kubernetes
DAOS Storage Stack (client libraries, storage engine, control plane)
A modern high-performance open source message queuing system
An exabyte-scale, multi-region distributed file system
nanobind: tiny and efficient C++/Python bindings
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Python quantitative trading strategies including VIX Calculator, Pattern Recognition, Commodity Trading Advisor, Monte Carlo, Options Straddle, Shooting Star, London Breakout, Heikin-Ashi, Pair Tra…
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
Offline optimization of your disaggregated Dynamo graph
An external log connector example for LMCache
A lightweight, powerful framework for multi-agent workflows
Renderer for the harmony response format to be used with gpt-oss
A throughput-oriented high-performance serving framework for LLMs
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…