Skip to content
View song630's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report song630

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,242 559 Updated Nov 3, 2025

Native Multimodal Models are World Learners

Python 1,218 42 Updated Nov 7, 2025

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,196 55 Updated Oct 16, 2025

This repository introduces a large-scale video aesthetics database, VADB, and proposes an novel video aesthetics scoring framework, VADB-Net.

Python 16 1 Updated Oct 30, 2025

EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

Python 163 4 Updated Nov 9, 2025
Python 1,607 68 Updated Oct 28, 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

Python 33 1 Updated Oct 26, 2025

Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)

Python 216 12 Updated Sep 8, 2025

[NeurIPS 2025] Improving Video Generation with Human Feedback

Python 330 7 Updated Sep 24, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 16,279 1,298 Updated Nov 10, 2025

The collection of awesome papers on alignment of diffusion models.

365 17 Updated Oct 27, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,570 91 Updated Nov 4, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,110 37 Updated Oct 4, 2025
Python 6 Updated Jul 21, 2025

[NeurIPS 2025] A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

12 Updated Nov 9, 2025
7 Updated Jun 20, 2025

Paper List of Inference/Test Time Scaling/Computing

Python 320 9 Updated Aug 28, 2025

[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Prompting

Python 58 4 Updated Oct 30, 2025
HTML 37 1 Updated Jun 20, 2025

[AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding

Python 33 Updated Mar 21, 2025

Lora traing script for Lightricks LTX-video

Python 66 4 Updated Feb 12, 2025

A pipeline parallel training script for diffusion models.

Python 1,701 229 Updated Nov 7, 2025

VideoSys: An easy and efficient system for video generation

Python 2,005 132 Updated Aug 27, 2025

👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...

Python 2,883 227 Updated Oct 20, 2025

Official repository for LTX-Video

Python 8,747 805 Updated Oct 25, 2025

Physical laws underpin all existence, and harnessing them for generative modeling opens boundless possibilities for advancing science and shaping the future!

234 5 Updated Apr 21, 2025

Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation

Python 30 2 Updated Jun 30, 2025

Benchmarking physical understanding in generative video models

Python 219 19 Updated Oct 28, 2025

[ICCV 2025] LayerAnimate: Layer-specific Control for Animation

Python 192 7 Updated Aug 22, 2025
Next