- Hong Kong
-
14:34
(UTC +08:00)
Highlights
- Pro
Starred repositories
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
Scaling Spatial Intelligence with Multimodal Foundation Models
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Wan: Open and Advanced Large-Scale Video Generative Models
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
OmniGen2: Exploration to Advanced Multimodal Generation.
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
[CVPR25 Oral (Top 3.3%)] Official code for paper "Reconstructing Humans with a Biomechanically Accurate Skeleton".
[CVPR 2024 Highlight] XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
[NeurIPS 2025] MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
CUDA Python: Performance meets Productivity
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
[CVPR 2025 Oral & Best Paper Finalist] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
[ICLR 2025] Track-On: Transformer-based Online Point Tracking with Memory, and [arXiv 2025] Track-On2: Enhancing Online Point Tracking with Memory
Official implementation of Continuous 3D Perception Model with Persistent State
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [Siggraph Asian 2025]
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
Lets make video diffusion practical!
Universal Monocular Metric Depth Estimation
Code for the paper MultiPhys: Multi-Person Physics-aware 3D Motion Estimation (CVPR 2024)