-
Shanghai Jiao Tong Univesity
- Shanghai
- @yangshuai1227
- https://YS-IMTech.github.io
Starred repositories
Cambrian-S: Towards Spatial Supersensing in Video
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
Native Multimodal Models are World Learners
Krea Realtime 14B. An open-source realtime AI video model.
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
This is the official implementation for Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1.
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
SOTAMak1r / Infinite-Forcing
Forked from guandeh17/Self-ForcingInfinite-Forcing: Towards Infinite-Long Video Generation
Reference PyTorch implementation and models for DINOv3
A tool for running and customizing real-time, interactive generative AI pipelines and models
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Ctrl-World: A Controllable Generative World Model for Robot Manipualtion
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
A simple state update rule to enhance length generalization for CUT3R
AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”
Official Repo for Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
LongLive: Real-time Interactive Long Video Generation
An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
A minimal implementation of DeepMind's Genie world model
An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Tongyi Deep Research, the Leading Open-source Deep Research Agent