Stars
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Code of Separate and Enhance work for better compositional generation from prompt
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Joint image and Depth inpainting, ldm3d
[SIGGRAPH 2025] Official implementation of 'Motion Inversion For Video Customization'
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Official implementation of AnimateDiff.
Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
Inpaint anything using Segment Anything and inpainting models.
Official PyTorch implementation of the paper "Neural Congealing: Aligning Images to a Joint Semantic Atlas" (CVPR 2023)
Zero-shot Image-to-Image Translation [SIGGRAPH 2023]
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Official Repo of the paper "Pop2Piano : Pop Audio-based Piano Cover Generation"
A latent text-to-image diffusion model
Release for Improved Denoising Diffusion Probabilistic Models
CVPR2021 Content-Aware GAN Compression
Pretrained deep learning models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet, etc.
GLIDE: a diffusion-based text-conditional image synthesis model
Official PyTorch Implementation of "GAN-Supervised Dense Visual Alignment" (CVPR 2022 Oral, Best Paper Finalist)
[ACM MM 2021 Best Paper Award] Video Background Music Generation with Controllable Music Transformer