Stars
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
Kaleido: Open-sourced multi-subject reference video generation model, enabling controllable, high-fidelity video synthesis from multiple image references.
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations https://video-prediction-policy.github.io
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
[CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Simple Controlnet module for CogvideoX model.
Helpful tools and examples for working with flex-attention
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
[CVPR 2025 Highlight] 3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion
Keyframe Interpolation with CogvideoX
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Scalable and memory-optimized training of diffusion models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Official Code for MotionCtrl [SIGGRAPH 2024]