Stars
CUDA benchmarks for measuring GPU utilization and interference
✨ Kubectl plugin to create manifests with LLMs
A throughput-oriented high-performance serving framework for LLMs
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
SGLang is a fast serving framework for large language models and vision language models.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
A comprehensive software toolkit for the measurement, analysis, and visualization of energy use, power draw, hardware performance, and carbon emissions across AI workloads.
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe…
Making large AI models cheaper, faster and more accessible
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
A Framework of Small-scale Large Multimodal Models
LIBSVM -- A Library for Support Vector Machines