Skip to content
View jianfei-wangg's full-sized avatar
  • SenseTime
  • Shanghai, China

Block or report jianfei-wangg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Perplexity GPU Kernels

C++ 553 75 Updated Nov 7, 2025

Fast CUDA matrix multiplication from scratch

Cuda 1,020 153 Updated Sep 2, 2025

My learning notes for ML SYS.

Python 5,072 329 Updated Jan 16, 2026

Tile primitives for speedy kernels

Cuda 3,087 225 Updated Jan 17, 2026

Ongoing research training transformer models at scale

Python 14,945 3,500 Updated Jan 18, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,663 2,018 Updated Jan 18, 2026

Modeling, training, eval, and inference code for OLMo

Python 6,290 697 Updated Nov 24, 2025

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 184 13 Updated Apr 16, 2024

Full-speed Array of Structures access

C++ 176 28 Updated Apr 25, 2023

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,295 89 Updated Mar 27, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,385 1,012 Updated Dec 4, 2025

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.

C++ 513 125 Updated Oct 30, 2024

A primitive library for neural network

C++ 1,368 223 Updated Nov 24, 2024

[DEPRECATED] Moved to ROCm/rocm-libraries repo

Assembly 1,189 270 Updated Jan 16, 2026

Productive, portable, and performant GPU programming in Python.

C++ 27,898 2,380 Updated Jan 5, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,120 1,630 Updated Jan 15, 2026

Benchmarks for partial evaluation of different GPU application scenarios

C++ 4 Updated Jul 19, 2020

快速入门CMake,通过例程学习语法。在线阅读地址:https://sfumecjf.github.io/cmake-examples-Chinese/

C++ 2,508 363 Updated Nov 28, 2022
Cuda 36 10 Updated Jul 25, 2022

A Graph-Based Framework for Information Extraction

Python 110 27 Updated Jul 10, 2019

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Jupyter Notebook 842 147 Updated Oct 11, 2022

🌺 Minimalist Vim Plugin Manager

Vim Script 35,505 1,956 Updated Nov 6, 2025

Assembler for NVIDIA Volta and Turing GPUs

Python 236 41 Updated Jan 13, 2022

COLMAP - Structure-from-Motion and Multi-View Stereo

C++ 10,730 1,870 Updated Jan 18, 2026

C++ Primer 5 answers

C++ 8,292 2,982 Updated Jun 6, 2024
298 26 Updated Sep 23, 2025

C++ Insights - See your source code with the eyes of a compiler

C++ 4,437 262 Updated Jun 26, 2025

A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.

69,210 8,208 Updated Jan 16, 2026

Spike, a RISC-V ISA Simulator

C 2,993 1,015 Updated Jan 13, 2026
Next