-
Purdue University
- Bellevue
-
00:12
(UTC -08:00) - https://song630.github.io/yizhisong.github.io/
Highlights
- Pro
Stars
Reference PyTorch implementation and models for DINOv3
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
This repository introduces a large-scale video aesthetics database, VADB, and proposes an novel video aesthetics scoring framework, VADB-Net.
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
[NeurIPS 2025] Improving Video Generation with Human Feedback
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The collection of awesome papers on alignment of diffusion models.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[NeurIPS 2025] A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Paper List of Inference/Test Time Scaling/Computing
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting
[AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Lora traing script for Lightricks LTX-video
A pipeline parallel training script for diffusion models.
VideoSys: An easy and efficient system for video generation
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
Physical laws underpin all existence, and harnessing them for generative modeling opens boundless possibilities for advancing science and shaping the future!
Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Benchmarking physical understanding in generative video models
[ICCV 2025] LayerAnimate: Layer-specific Control for Animation