Stars
FastAPI framework, high performance, easy to learn, fast to code, ready for production
💥💻💥 A data-parallel functional programming language
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Mirror of the Glasgow Haskell Compiler. Please submit issues and patches to GHC's Gitlab instance (https://gitlab.haskell.org/ghc/ghc). First time contributors are encouraged to get started with th…
Universal LLM Deployment Engine with ML Compilation
eBPF Developer Tutorial: Learning eBPF Step by Step with Examples
A debugging and profiling tool that can trace and visualize python code execution
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
Neural Code Intelligence Survey 2024; Reading lists and resources
Fast and memory-efficient exact attention
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Ongoing research training transformer models at scale
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
scikit-learn: machine learning in Python