Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
GPUOCelot: A dynamic compilation framework for PTX
程序员延寿指南 | A programmer's guide to live longer
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
Making large AI models cheaper, faster and more accessible
A library of GPU kernels for sparse matrix operations.
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
Open source code for AlphaFold 2.
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Bolt is a deep learning library with high performance and heterogeneous flexibility.
High-efficiency floating-point neural network inference operators for mobile, server, and Web
zhenhuaw-me / qnnpack
Forked from pytorch/QNNPACKExplained QNNPACK Implementation
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs