Skip to content
View pengcuo's full-sized avatar

Block or report pengcuo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
HTML 14 3 Updated Jun 25, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,079 1,619 Updated Jan 9, 2026

Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.

Python 273 38 Updated Jan 9, 2026

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,517 127 Updated Jan 9, 2026
Python 672 68 Updated Dec 30, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,320 7,939 Updated Jan 9, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,026 307 Updated Dec 22, 2025

基于LLaMA2-7B增量预训练的藏文大语言模型TiLamb(Tibetan Large Language Model Base)

35 5 Updated Apr 3, 2024

PyTorch native quantization and sparsity for training and inference

Python 2,613 394 Updated Jan 9, 2026

A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models

Python 47 2 Updated Dec 24, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 617 132 Updated Jan 7, 2026

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 36,823 4,375 Updated Jan 7, 2026
C++ 6 1 Updated Mar 11, 2025

My learning notes for ML SYS.

Python 4,996 325 Updated Jan 8, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,961 926 Updated Dec 15, 2025

Implement Flash Attention using Cute.

Cuda 100 8 Updated Dec 17, 2024

A PyTorch native platform for training generative AI models

Python 4,943 662 Updated Jan 9, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,276 912 Updated Jan 7, 2026

中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,496 1,620 Updated Aug 20, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,748 272 Updated Jul 18, 2025

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,731 476 Updated Apr 13, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

72,676 8,347 Updated Dec 22, 2025

Excalidraw-CN 是支持中文手写和多画布的 Excalidraw 白板工具。Excalidraw-CN is a whiteboard supporting Chinese hand draw font and multi-canvas based on Excalidraw.

TypeScript 2,324 293 Updated Jan 16, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 11,728 1,187 Updated Apr 30, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 980 83 Updated Sep 4, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,779 31,662 Updated Jan 9, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,214 4,002 Updated Jan 9, 2026

hpc-learning

771 47 Updated May 30, 2024
Next