Skip to content
View feifeibear's full-sized avatar

Block or report feifeibear

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 187 25 Updated Nov 26, 2025

The best ChatGPT that $100 can buy.

Python 37,613 4,608 Updated Nov 17, 2025

Post-training with Tinker

Python 2,218 189 Updated Nov 25, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,025 325 Updated Nov 26, 2025

Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.

Python 734 100 Updated Oct 29, 2025

Render any git repo into a single static HTML page for humans or LLMs

Python 1,925 190 Updated Aug 21, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,307 1,945 Updated Nov 1, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 496 111 Updated Nov 26, 2025

(best/better) practices of megatron on veRL and tuning guide

Shell 103 8 Updated Sep 26, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,598 284 Updated Nov 26, 2025

🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 589 27 Updated Nov 26, 2025

FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]

Python 45 6 Updated Sep 6, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,505 59 Updated Jun 14, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,666 2,661 Updated Nov 26, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,562 215 Updated Jun 17, 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 567 32 Updated Nov 26, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,342 106 Updated Nov 26, 2025

This package contains the original 2012 AlexNet code.

Cuda 2,778 360 Updated Mar 12, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,755 6,535 Updated Nov 26, 2025

Sampling profiler for Python programs

Rust 14,625 486 Updated Nov 24, 2025

Fast Multi-dimensional Sparse Attention

C++ 664 52 Updated Nov 19, 2025

HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Python 1,736 176 Updated May 20, 2025

Analyze computation-communication overlap in V3/R1.

1,120 143 Updated Mar 21, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,847 431 Updated Mar 5, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,482 966 Updated Oct 24, 2025

Expert Parallelism Load Balancer

Python 1,312 195 Updated Mar 24, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,772 2,165 Updated Jul 17, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,912 753 Updated Nov 25, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,760 1,005 Updated Nov 25, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,884 905 Updated Sep 30, 2025
Next