dongxiao92

Follow

👻

dongxiao dongxiao92

👻

Follow

Architect at NVIDIA

12 followers · 73 following

NVIDIA
Shanghai, China

Achievements

Achievements

Stars

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,931 286 Updated May 15, 2025

microsoft / proxy

Proxy: Next Generation Polymorphism in C++

C++ 2,998 202 Updated Nov 11, 2025

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 483 249 Updated Nov 12, 2025

ELS-RD / kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,585 97 Updated Feb 16, 2024

gtcasl / gpuocelot

GPUOCelot: A dynamic compilation framework for PTX

C++ 288 69 Updated Jul 31, 2023

geekan / HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

34,524 2,367 Updated May 19, 2025

codyjrivera / tsm2x-imp

Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA

Cuda 35 11 Updated Jul 28, 2020

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 355 62 Updated Jul 25, 2022

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,235 4,538 Updated Nov 12, 2025

google-research / sputnik

A library of GPU kernels for sparse matrix operations.

C++ 275 54 Updated Nov 24, 2020

microsoft / nn-Meter

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Python 360 61 Updated Jul 30, 2024

google-deepmind / alphafold

Open source code for AlphaFold 2.

Python 13,951 2,504 Updated Oct 31, 2025

GVProf / GVProf

GVProf: A Value Profiler for GPU-based Clusters

Python 52 10 Updated Mar 24, 2024

OpenPPL / ppl.nn

A primitive library for neural network

C++ 1,368 222 Updated Nov 24, 2024

OpenPPL / ppl.cv

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.

C++ 512 122 Updated Oct 30, 2024

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,801 463 Updated Oct 9, 2023

NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 134 Updated Nov 7, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,763 1,522 Updated Nov 10, 2025

zwang4 / awesome-machine-learning-in-compilers

Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation

1,618 172 Updated Sep 12, 2025

lijiansong / clang-llvm-tutorial

clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...

C++ 276 57 Updated Apr 16, 2022

cmdbug / TNN_Demo

🍉 移动端TNN部署学习笔记，支持Android与iOS。

C++ 74 20 Updated Apr 25, 2021

Tencent / TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…

C++ 4,589 771 Updated May 9, 2025

tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,331 195 Updated Apr 14, 2025

huawei-noah / bolt

Bolt is a deep learning library with high performance and heterogeneous flexibility.

C++ 953 162 Updated Apr 11, 2025

tvmai / meetup-slides

Place for meetup slides

140 16 Updated Oct 11, 2020

XiuYuLi / flexible-gemm

flexible-gemm conv of deepcore

C 17 14 Updated Dec 2, 2019

CppCon / CppCon2014

Speaker materials from CppCon 2014

C++ 2,300 395 Updated Jan 10, 2016

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,161 449 Updated Nov 12, 2025

zhenhuaw-me / qnnpack

Forked from pytorch/QNNPACK

Explained QNNPACK Implementation

C 21 10 Updated Sep 20, 2025

chenxuhao / caffe-escoin

Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

C++ 16 2 Updated Feb 28, 2019