RbRe145

Follow

🎯

Focusing

Huaijin Zheng RbRe145

🎯

Focusing

Follow

high performance computing computes a me a better future

12 followers · 83 following

SEU
14:44 (UTC -12:00)

Achievements

Achievements

Lists (3)

Sort

✨ Inspiration

IWannaBiye

zk

Stars

RbRe145 / GraphNet

Forked from PaddlePaddle/GraphNet

Python 1 Updated Oct 20, 2025

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA (+ more DSLs)

Python 653 83 Updated Nov 9, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,164 82 Updated Aug 28, 2025

PaddlePaddle / GraphNet

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 67 30 Updated Nov 9, 2025

hxzd5568 / GraphNet

Python 4 1 Updated Jul 18, 2025

hxzd5568 / Athena_torch

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

RbRe145 / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated May 29, 2025

RbRe145 / Paddle

Forked from PaddlePaddle/Paddle

打工人打工魂

C++ 1 Updated Jun 19, 2025

l0ngc / hpc-learning

hpc-learning

757 48 Updated May 30, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,367 829 Updated Nov 6, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,438 676 Updated Nov 10, 2025

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,304 254 Updated Sep 3, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,424 2,125 Updated Nov 9, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,097 250 Updated Nov 9, 2025

WenhaoHe02 / multinode_parallism_of_Moe

东南大学srtp，多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

TideDra / lmm-r1

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 828 54 Updated May 14, 2025

24mlight / A_Share_investment_Agent

Python 2,131 567 Updated Jul 25, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,017 694 Updated Nov 6, 2025

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,376 344 Updated Jul 25, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,641 4,614 Updated Nov 8, 2025

OpenHands / OpenHands

🙌 OpenHands: Code Less, Make More

Python 64,836 7,877 Updated Nov 9, 2025

openmlsys / openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 135 15 Updated Aug 12, 2023

openmlsys / openmlsys-zh

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,691 476 Updated Apr 13, 2024

bytedance / libnvmf

NVMe over Fabrics user space initiator library.

C 36 3 Updated Sep 2, 2024

bytedance / effective_transformer

Running BERT without Padding

C++ 475 53 Updated Mar 18, 2022

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 270 18 Updated May 1, 2025

RussWong / CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

Cuda 258 65 Updated Jul 9, 2024

RussWong / LLM-engineering

Cuda 25 13 Updated Aug 9, 2025

PKUFlyingPig / CS149-parallel-computing

Learning materials for Stanford CS149 : Parallel Computing

C 253 43 Updated Jul 31, 2021

phoniex628 / jd_maotai_seckill

优化版本的京东茅台抢购神器

Python 254 4,858 Updated Dec 30, 2020