-
Nanyang Technological University (NTU)
- Singapore
- https://ziqihuangg.github.io
- @ziqi_huang_
Stars
the Quest for Generalizable Motion Generation: Data, Model, and Evaluation
MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator
🌐 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World
This is a collection of recent papers on reasoning in video generation models.
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
Code for CineScale, higher-resolution video generation based on Wan
[ICIP2025 Spotlight] Efficient and High-Fidelity Image Generation
Wan: Open and Advanced Large-Scale Video Generative Models
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
A list of works on video generation towards world model
Lets make video diffusion practical!
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis (ICCV, 2025)
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
A Python package that makes it easy for developers to create AI apps powered by various AI providers.
[ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible
Understand Human Behavior to Align True Needs
Implementation of P+: Extended Textual Conditioning in Text-to-Image Generation
LAVIS - A One-stop Library for Language-Vision Intelligence
Lumina-T2X is a unified framework for Text to Any Modality Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
[CSUR] A Survey on Video Diffusion Models
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
[ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python