Highlights
- Pro
Stars
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
Official implementation of "OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes".
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Fast and Universal 3D reconstruction model for versatile tasks
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
[SIGGRAPH Asia 2025] OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion
LongLive: Real-time Interactive Long Video Generation
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
ViPE: Video Pose Engine for Geometric 3D Perception
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"
[Arxiv'25] IC-Custom: Diverse Image Customization via In-Context Learning
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
[ICCV 2025 Oral] MVTracker: Multi-view 3D Point Tracking
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Reference PyTorch implementation and models for DINOv3
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Pusa: Thousands Timesteps Video Diffusion Model
The absolute trainer to light up AI agents.
Official repository of In-Context LoRA for Diffusion Transformers
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Wan: Open and Advanced Large-Scale Video Generative Models
Lets make video diffusion practical!
Code for Streaming 4D Visual Geometry Transformer