Skip to content
View pengcuo's full-sized avatar

Block or report pengcuo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
HTML 15 3 Updated Jun 25, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,120 1,630 Updated Jan 15, 2026

Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.

Python 286 41 Updated Jan 18, 2026

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,545 130 Updated Jan 17, 2026
Python 679 69 Updated Dec 30, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,016 8,020 Updated Jan 17, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,060 316 Updated Jan 17, 2026

基于LLaMA2-7B增量预训练的藏文大语言模型TiLamb(Tibetan Large Language Model Base)

34 5 Updated Apr 3, 2024

PyTorch native quantization and sparsity for training and inference

Python 2,630 404 Updated Jan 17, 2026

A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models

Python 47 2 Updated Dec 24, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 635 136 Updated Jan 18, 2026

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 37,546 4,483 Updated Jan 18, 2026
C++ 6 1 Updated Mar 11, 2025

My learning notes for ML SYS.

Python 5,072 329 Updated Jan 16, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,981 936 Updated Jan 16, 2026

Implement Flash Attention using Cute.

Cuda 100 8 Updated Dec 17, 2024

A PyTorch native platform for training generative AI models

Python 4,974 670 Updated Jan 18, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,396 926 Updated Jan 18, 2026

中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 10,505 1,621 Updated Aug 20, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,750 272 Updated Jul 18, 2025

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,743 476 Updated Apr 13, 2024

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

73,373 8,424 Updated Dec 22, 2025

Excalidraw-CN 是支持中文手写和多画布的 Excalidraw 白板工具。Excalidraw-CN is a whiteboard supporting Chinese hand draw font and multi-canvas based on Excalidraw.

TypeScript 2,328 294 Updated Jan 16, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 11,930 1,200 Updated Apr 30, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 986 83 Updated Sep 4, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 155,362 31,776 Updated Jan 18, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,528 4,085 Updated Jan 19, 2026

hpc-learning

774 47 Updated May 30, 2024
Next