Skip to content
View alanzhai219's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report alanzhai219

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A tiny deep learning training framework implemented from scratch in C++ that follows PyTorch's API.

C++ 124 23 Updated Nov 1, 2025

An open-source cross-platform alternative to AirDrop

Dart 70,290 3,774 Updated Nov 14, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,677 263 Updated Nov 6, 2025

High performance, low latency market trading application written in C++

C++ 8 6 Updated Jan 5, 2025

Building Low Latency Applications with CPP by Packt Publishing

HTML 568 170 Updated May 9, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 393 49 Updated Oct 10, 2025

CUDA Embedding Lookup Kernel Library

Cuda 33 4 Updated Oct 21, 2025

LeetGPU Challenges

Python 463 33 Updated Nov 11, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 92 13 Updated Sep 30, 2025

Fast and memory efficient c++ flat hash table/map/set

C++ 655 67 Updated Nov 8, 2025

CPU inference for the DeepSeek family of large language models in C++

C++ 313 34 Updated Oct 2, 2025

一款通过电影、美剧或文档中的真实语境学习英语单词的应用,让您在原汁原味的情境中记忆词汇,提升学习效率。

Kotlin 2,824 183 Updated Oct 15, 2025

Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.

C 145 15 Updated Jul 5, 2025

Static suckless single batch CUDA-only qwen3-0.6B mini inference engine

Cuda 512 41 Updated Sep 8, 2025

Learning assembly for Linux x86_64

Assembly 3,305 365 Updated Oct 30, 2025

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

C++ 531 50 Updated Sep 13, 2025

无需登录 一键复制csdn的代码

HTML 55 8 Updated Oct 29, 2023

MKEditor - Markdown with style.

TypeScript 223 10 Updated Oct 21, 2025

开源白板工具(SaaS),一体化白板,包含思维导图、流程图、自由画等。All in one open-source whiteboard tool with mind, flowchart, freehand and etc.

TypeScript 12,422 988 Updated Nov 10, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,233 287 Updated Nov 14, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 78,664 11,641 Updated Nov 13, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 662 86 Updated Nov 15, 2025

Master CUDA by writing CUDA kernels.

C 5 Updated Aug 13, 2025

LLaMA 2 implemented from scratch in PyTorch

Python 358 68 Updated Sep 25, 2023

source code of The Standard C Library, by Plauger

C 334 131 Updated Oct 6, 2016

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda 100 7 Updated Jun 28, 2025

🚧 An experimental communicating attention kernel based on DeepEP.

Cuda 34 Updated Jul 29, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,721 988 Updated Nov 6, 2025

A simple, lightweight PowerShell script to remove pre-installed apps, disable telemetry, as well as perform various other changes to customize, declutter and improve your Windows experience. Win11D…

PowerShell 33,002 1,291 Updated Nov 14, 2025
Next