-
University of Chinese Academy of Sciences
- Beijing
-
15:55
(UTC +08:00) - https://luoxubo.github.io/
- @xubo_luo
Highlights
- Pro
Lists (23)
Sort Name ascending (A-Z)
Attention mechanism
Autonomous driving
clip
Efficiency
ekf
Event Camera
Facial expression recognition
flow matching
Homography Estimation
IELTS
Image fusion
Image matching
Some nice image matching related worksImage retrieval
Lab homepage
Some nice templates of homepage of labsLearning
Mulit sensor localization
NeRF
Paper codes
Pose estimation
Segmentation
SLAM with deep learning
Tracking
Object tracking repos.Visual localization
Starred repositories
[AAAI 2024] Mono3DVG: 3D Visual Grounding in Monocular Images, AAAI, 2024
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Official code release for CoRL'25 paper: VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning
Repository of the paper "AnyUp: Universal Feature Upsampling".
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Reading list for research topics in embodied vision
This is the code for the IROS2025 RoboSense challenge track1: LLM for Driving
[ACMMM 2025] Official implementation of SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero Shot 3D Visual Grounding
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
SLAM-Former: Putting SLAM into One Transformer
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
MLNLP社区用来帮助大家避免论文投稿小错误的整理仓库。 Paper Writing Tips
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Building General-Purpose Robots Based on Embodied Foundation Model
[CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation
ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
Code for FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
InternRobotics' open platform for building generalized navigation foundation models.
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
[IROS25] Combining Flow Matching and Depth Priors for Efficient Navigation
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model