Lists (1)
Sort Name ascending (A-Z)
Starred repositories
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Sandbox implemented in GO including containers (namespace, cgroup), ptrace, seccomp
A tool for bandwidth measurements on NVIDIA GPUs.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Benchmarking guide for the Azure AI Infrastructure.
Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
GPU & cluster health and performance monitoring solution for OCI
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识
ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
CP-Bench is a PyTorch testing/benchmarking suite to detect AI hardware issues, such as functional reliability, silent data corruption, and performance anomalies
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
Automatically cordon and drain Kubernetes nodes based on node conditions
A Datacenter Scale Distributed Inference Serving Framework
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
DeepEP: an efficient expert-parallel communication library
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Python tool for converting files and office documents to Markdown.
Instant Kubernetes-Native Application Observability
Build and run containers leveraging NVIDIA GPUs
a unified scheduler for online and offline tasks
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining