-
Stanford University
- Stanford, CA
-
21:00
(UTC -08:00) - haoyi-duan.github.io
Highlights
- Pro
Stars
Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models"
[ArXiv 2025] A survey about controllable video generation: This repo is the official awesome of "Controllable video generation: A survey"
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Rectified Flow Inversion (RF-Inversion) - ICLR 2025
Enjoy the magic of Diffusion models!
[NeurIPS 2025] Official code for Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[SIGGRAPH-ASIA 2025] Official implementation of "VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models"
[arXiv 2025] VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation
MetaDrive: Lightweight driving simulator for everyone
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving
[arXiv 2025] GMR: General Motion Retargeting. Retarget human motions into diverse humanoid robots in real time on CPU. Retargeter for TWIST.
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
A pipeline parallel training script for diffusion models.
TripoSR: Fast 3D Object Reconstruction from a Single Image
Official code for 4Diffusion: Multi-view Video Diffusion Model for 4D Generation.
Video-P2P: Video Editing with Cross-attention Control
MagicEdit: High-Fidelity Temporally Coherent Video Editing
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
[CVPR'24 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Make self forcing endless. Add cache purging. Add prompt controllability.
Tooling for the Common Objects In 3D dataset.