Skip to content
View dongxiao92's full-sized avatar
👻
👻
  • NVIDIA
  • Shanghai, China

Block or report dongxiao92

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,931 286 Updated May 15, 2025

Proxy: Next Generation Polymorphism in C++

C++ 2,998 202 Updated Nov 11, 2025

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 483 249 Updated Nov 12, 2025

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,585 97 Updated Feb 16, 2024

GPUOCelot: A dynamic compilation framework for PTX

C++ 288 69 Updated Jul 31, 2023

程序员延寿指南 | A programmer's guide to live longer

34,524 2,367 Updated May 19, 2025

Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA

Cuda 35 11 Updated Jul 28, 2020

Yinghan's Code Sample

Cuda 355 62 Updated Jul 25, 2022

Making large AI models cheaper, faster and more accessible

Python 41,235 4,538 Updated Nov 12, 2025

A library of GPU kernels for sparse matrix operations.

C++ 275 54 Updated Nov 24, 2020

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Python 360 61 Updated Jul 30, 2024

Open source code for AlphaFold 2.

Python 13,951 2,504 Updated Oct 31, 2025

GVProf: A Value Profiler for GPU-based Clusters

Python 52 10 Updated Mar 24, 2024

A primitive library for neural network

C++ 1,368 222 Updated Nov 24, 2024

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.

C++ 512 122 Updated Oct 30, 2024

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,801 463 Updated Oct 9, 2023

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 134 Updated Nov 7, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,763 1,522 Updated Nov 10, 2025

Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation

1,618 172 Updated Sep 12, 2025

clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...

C++ 276 57 Updated Apr 16, 2022

🍉 移动端TNN部署学习笔记,支持Android与iOS。

C++ 74 20 Updated Apr 25, 2021

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…

C++ 4,589 771 Updated May 9, 2025

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,331 195 Updated Apr 14, 2025

Bolt is a deep learning library with high performance and heterogeneous flexibility.

C++ 953 162 Updated Apr 11, 2025

Place for meetup slides

140 16 Updated Oct 11, 2020

flexible-gemm conv of deepcore

C 17 14 Updated Dec 2, 2019

Speaker materials from CppCon 2014

C++ 2,300 395 Updated Jan 10, 2016

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,161 449 Updated Nov 12, 2025

Explained QNNPACK Implementation

C 21 10 Updated Sep 20, 2025

Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

C++ 16 2 Updated Feb 28, 2019
Next