Stars
Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
Official implementation of "OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes".
LongLive: Real-time Interactive Long Video Generation
This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
Structured Video Comprehension of Real-World Shorts
[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
[ICCV 2025] Official implementation of the paper "DreamCube: 3D Panorama Generation via Multi-plane Synchronization".
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
HoloPart: Generative 3D Part Amodal Segmentation
[ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
[ICCV 2025] GameFactory: Creating New Games with Generative Interactive Videos
[CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project
[Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation
[CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
[CVPR 2024] DreamComposer: Controllable 3D Object Generation via Multi-View Conditions