Stars
PyTorch implementation of "VFM-VAE" (arXiv:2510.18457).
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variants as the primary backbone with support for ImageNet train…
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
The Sound of Simulation (CoRL 2025 Best Paper Finalist)
[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
A framework for few-shot evaluation of language models.
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
Lightweight coding agent that runs in your terminal
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
This repository contains the source code for the paper First Order Motion Model for Image Animation
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
Train vision models using JAX and 🤗 transformers
[ICML 2024] CLLMs: Consistency Large Language Models
Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]
DataComp: In search of the next generation of multimodal datasets