jianfei-wangg

Jianfei Wang jianfei-wangg

HPC(GPU) Software Engineer

21 followers · 6 following

SenseTime
Shanghai, China

Achievements

Stars

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 552 75 Updated Nov 7, 2025

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,008 152 Updated Sep 2, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 5,011 325 Updated Jan 8, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,044 222 Updated Jan 11, 2026

Snoopy1866 / LiTiaotiao-Custom-Rules

10,402 748 Updated Nov 8, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 14,863 3,479 Updated Jan 11, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,600 2,003 Updated Jan 11, 2026

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 6,282 695 Updated Nov 24, 2025

IST-DASLab / QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 183 13 Updated Apr 16, 2024

bryancatanzaro / trove

Full-speed Array of Structures access

C++ 176 28 Updated Apr 25, 2023

chengzeyi / stable-fast

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,295 89 Updated Mar 27, 2025

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,385 1,012 Updated Dec 4, 2025

OpenPPL / ppl.cv

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.

C++ 514 126 Updated Oct 30, 2024

OpenPPL / ppl.nn

A primitive library for neural network

C++ 1,369 223 Updated Nov 24, 2024

ROCm / MIOpen

[DEPRECATED] Moved to ROCm/rocm-libraries repo

Assembly 1,187 269 Updated Jan 9, 2026

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

C++ 27,873 2,377 Updated Jan 5, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,088 1,618 Updated Jan 9, 2026

Tiltedprogrammer / spec

Benchmarks for partial evaluation of different GPU application scenarios

C++ 4 Updated Jul 19, 2020

SFUMECJF / cmake-examples-Chinese

快速入门CMake,通过例程学习语法。在线阅读地址：https://sfumecjf.github.io/cmake-examples-Chinese/

C++ 2,509 363 Updated Nov 28, 2022

pnnl / TCBNN

Cuda 36 10 Updated Jul 25, 2022

thomas0809 / GraphIE

A Graph-Based Framework for Information Extraction

Python 110 27 Updated Jul 10, 2019

src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Jupyter Notebook 839 147 Updated Oct 11, 2022

junegunn / vim-plug

🌺 Minimalist Vim Plugin Manager

Vim Script 35,494 1,955 Updated Nov 6, 2025

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 236 41 Updated Jan 13, 2022

colmap / colmap

COLMAP - Structure-from-Motion and Multi-View Stereo

C++ 10,656 1,863 Updated Jan 11, 2026

Mooophy / Cpp-Primer

C++ Primer 5 answers

C++ 8,295 2,982 Updated Jun 6, 2024

NVlabs / NVBit

298 26 Updated Sep 23, 2025

andreasfertig / cppinsights

C++ Insights - See your source code with the eyes of a compiler

C++ 4,437 262 Updated Jun 26, 2025

fffaraz / awesome-cpp

A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.

69,077 8,204 Updated Jan 11, 2026

riscv-software-src / riscv-isa-sim

Spike, a RISC-V ISA Simulator

C 2,989 1,014 Updated Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jianfei Wang jianfei-wangg

Achievements

Achievements

Block or report jianfei-wangg

Stars

perplexityai / pplx-kernels

siboehm / SGEMM_CUDA

zhaochenyang20 / Awesome-ML-SYS-Tutorial

HazyResearch / ThunderKittens

Snoopy1866 / LiTiaotiao-Custom-Rules

NVIDIA / Megatron-LM

NVIDIA / TensorRT-LLM

allenai / OLMo

IST-DASLab / QUIK

bryancatanzaro / trove

chengzeyi / stable-fast

Oneflow-Inc / oneflow

OpenPPL / ppl.cv

OpenPPL / ppl.nn

ROCm / MIOpen

taichi-dev / taichi

NVIDIA / cutlass

Tiltedprogrammer / spec

SFUMECJF / cmake-examples-Chinese

pnnl / TCBNN

thomas0809 / GraphIE

src-d / kmcuda

junegunn / vim-plug

daadaada / turingas

colmap / colmap

Mooophy / Cpp-Primer

NVlabs / NVBit

andreasfertig / cppinsights

fffaraz / awesome-cpp

riscv-software-src / riscv-isa-sim