Skip to content
View RbRe145's full-sized avatar
🎯
Focusing
🎯
Focusing
  • SEU
  • 14:44 (UTC -12:00)

Block or report RbRe145

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1 Updated Oct 20, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 653 83 Updated Nov 9, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,164 82 Updated Aug 28, 2025

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 67 30 Updated Nov 9, 2025
Python 4 1 Updated Jul 18, 2025

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated May 29, 2025

打工人打工魂

C++ 1 Updated Jun 19, 2025

hpc-learning

757 48 Updated May 30, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,367 829 Updated Nov 6, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,438 676 Updated Nov 10, 2025

Minimalistic large language model 3D-parallelism training

Python 2,304 254 Updated Sep 3, 2025

Fast and memory-efficient exact attention

Python 20,424 2,125 Updated Nov 9, 2025

My learning notes/codes for ML SYS.

Python 4,097 250 Updated Nov 9, 2025

东南大学srtp,多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 828 54 Updated May 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,017 694 Updated Nov 6, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,376 344 Updated Jul 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,641 4,614 Updated Nov 8, 2025

🙌 OpenHands: Code Less, Make More

Python 64,836 7,877 Updated Nov 9, 2025

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 135 15 Updated Aug 12, 2023

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,691 476 Updated Apr 13, 2024

NVMe over Fabrics user space initiator library.

C 36 3 Updated Sep 2, 2024

Running BERT without Padding

C++ 475 53 Updated Mar 18, 2022

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 270 18 Updated May 1, 2025

A CUDA tutorial to make people learn CUDA program from 0

Cuda 258 65 Updated Jul 9, 2024

Learning materials for Stanford CS149 : Parallel Computing

C 253 43 Updated Jul 31, 2021

优化版本的京东茅台抢购神器

Python 254 4,858 Updated Dec 30, 2020
Next