Stars
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
[NeurIPS 2025] PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Lets make video diffusion practical!
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Wan: Open and Advanced Large-Scale Video Generative Models
Collection of scripts to build small-scale datasets for fine-tuning video generation models.
Lora traing script for Lightricks LTX-video
[NOTE] I do not have enough ressources to maintain VMS, please use Ostris's AI-Tookit instead
📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
PyTorch native quantization and sparsity for training and inference
[ICLR 2025] Official implementation of "DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation".
The ultimate training toolkit for finetuning diffusion models
Scalable and memory-optimized training of diffusion models
A pipeline parallel training script for diffusion models.
musubi-tuner modified to tune image2video/video infilling
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A custom node for ComfyUI that adds cinematic and movie scene styles to video generation prompts. This node helps create more dynamic and professional-looking video outputs by incorporating iconic …
A general fine-tuning kit geared toward diffusion models.
Code for our paper: Learning Camera Movement Control from Real-World Drone Videos