Stars
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?
GenAI inference performance benchmarking tool
llm-d benchmark scripts and tooling
A light weight vLLM simulator, for mocking out replicas.
Achieve state of the art inference performance with modern accelerators on Kubernetes
Gateway API Inference Extension
A high-throughput and memory-efficient inference and serving engine for LLMs
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
LangChain for Go, the easiest way to write LLM-based programs in Go
GUI tool for visualizing the result data of deBruijn sequence complexity distribution study
KubeStellar - a flexible solution for multi-cluster configuration management for edge, multi-cloud, and hybrid cloud
the main repository for the multicluster global hub