Highlights
- Pro
Stars
[CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
[SIGGRAPH'24] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields
[NeurIPS 2025] Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Official Github repository for the CVPR 2025 paper "Color Alignment in Diffusion"
[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
[ICCV 2025 Highlight] Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
The official implementation of CVPR'25 Oral paper "Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise"
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
[CVPR 2025] Learning Flow Fields in Attention for Controllable Person Image Generation
Release repo for our SLAM Handbook
StableDelight: Revealing Hidden Textures by Removing Specular Reflections
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Official inference repo for FLUX.1 models
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation (TMLR)
Code for ICCV'2025 "Real3D: Scaling Up Large Reconstruction Models with Real-World Images"
[NeurIPS 2024 Spotlight] Implementation of the paper "3D Gaussian Splatting as Markov Chain Monte Carlo"
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
A generative speech model for daily dialogue.
[ICLR 2025] 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting