feifeibear

Jiarui Fang（方佳瑞） feifeibear

Democratizing AGI

1.7k followers · 101 following

Achievements

x3 x4 x3

Achievements

x3 x4 x3

Lists (5)

Sort

Stars

NVIDIA-NeMo / Automodel

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 187 25 Updated Nov 26, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 37,613 4,608 Updated Nov 17, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 2,218 189 Updated Nov 25, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,025 325 Updated Nov 26, 2025

nvidia-cosmos / cosmos-transfer1

Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.

Python 734 100 Updated Oct 29, 2025

karpathy / rendergit

Render any git repo into a single static HTML page for humans or LLMs

Python 1,925 190 Updated Aug 21, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,307 1,945 Updated Nov 1, 2025

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 496 111 Updated Nov 26, 2025

ISEEKYAN / verl_megatron_practice

(best/better) practices of megatron on veRL and tuning guide

Shell 103 8 Updated Sep 26, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,598 284 Updated Nov 26, 2025

vipshop / cache-dit

🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 589 27 Updated Nov 26, 2025

NoakLiu / FastCache-xDiT

Forked from xdit-project/xDiT

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]

Python 45 6 Updated Sep 6, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,505 59 Updated Jun 14, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,666 2,661 Updated Nov 26, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,562 215 Updated Jun 17, 2025

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 567 32 Updated Nov 26, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,342 106 Updated Nov 26, 2025

computerhistory / AlexNet-Source-Code

This package contains the original 2012 AlexNet code.

Cuda 2,778 360 Updated Mar 12, 2025

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,755 6,535 Updated Nov 26, 2025

benfred / py-spy

Sampling profiler for Python programs

Rust 14,625 486 Updated Nov 24, 2025

SHI-Labs / NATTEN

Fast Multi-dimensional Sparse Attention

C++ 664 52 Updated Nov 19, 2025

Tencent-Hunyuan / HunyuanVideo-I2V

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,736 176 Updated May 20, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,120 143 Updated Mar 21, 2025

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,847 431 Updated Mar 5, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,482 966 Updated Oct 24, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,312 195 Updated Mar 24, 2025

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,772 2,165 Updated Jul 17, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,912 753 Updated Nov 25, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,760 1,005 Updated Nov 25, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,884 905 Updated Sep 30, 2025

Jiarui Fang（方佳瑞） feifeibear

Lists (5)

Diffusion Models

Diffusion Models Inference

GPU Acceleration

LLM Inference

LLM Models

Stars