Skip to content
View jianfei-wangg's full-sized avatar
  • SenseTime
  • Shanghai, China

Block or report jianfei-wangg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Perplexity GPU Kernels

C++ 552 75 Updated Nov 7, 2025

Fast CUDA matrix multiplication from scratch

Cuda 1,008 152 Updated Sep 2, 2025

My learning notes for ML SYS.

Python 5,011 325 Updated Jan 8, 2026

Tile primitives for speedy kernels

Cuda 3,044 222 Updated Jan 11, 2026

Ongoing research training transformer models at scale

Python 14,863 3,479 Updated Jan 11, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,600 2,003 Updated Jan 11, 2026

Modeling, training, eval, and inference code for OLMo

Python 6,282 695 Updated Nov 24, 2025

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 183 13 Updated Apr 16, 2024

Full-speed Array of Structures access

C++ 176 28 Updated Apr 25, 2023

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,295 89 Updated Mar 27, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,385 1,012 Updated Dec 4, 2025

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.

C++ 514 126 Updated Oct 30, 2024

A primitive library for neural network

C++ 1,369 223 Updated Nov 24, 2024

[DEPRECATED] Moved to ROCm/rocm-libraries repo

Assembly 1,187 269 Updated Jan 9, 2026

Productive, portable, and performant GPU programming in Python.

C++ 27,873 2,377 Updated Jan 5, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,088 1,618 Updated Jan 9, 2026

Benchmarks for partial evaluation of different GPU application scenarios

C++ 4 Updated Jul 19, 2020

快速入门CMake,通过例程学习语法。在线阅读地址:https://sfumecjf.github.io/cmake-examples-Chinese/

C++ 2,509 363 Updated Nov 28, 2022
Cuda 36 10 Updated Jul 25, 2022

A Graph-Based Framework for Information Extraction

Python 110 27 Updated Jul 10, 2019

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

Jupyter Notebook 839 147 Updated Oct 11, 2022

🌺 Minimalist Vim Plugin Manager

Vim Script 35,494 1,955 Updated Nov 6, 2025

Assembler for NVIDIA Volta and Turing GPUs

Python 236 41 Updated Jan 13, 2022

COLMAP - Structure-from-Motion and Multi-View Stereo

C++ 10,656 1,863 Updated Jan 11, 2026

C++ Primer 5 answers

C++ 8,295 2,982 Updated Jun 6, 2024
298 26 Updated Sep 23, 2025

C++ Insights - See your source code with the eyes of a compiler

C++ 4,437 262 Updated Jun 26, 2025

A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.

69,077 8,204 Updated Jan 11, 2026

Spike, a RISC-V ISA Simulator

C 2,989 1,014 Updated Jan 5, 2026
Next