- 
                  Tsinghua University
- Beijing, China
Stars
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Unified framework for robot learning built on NVIDIA Isaac Sim
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Cosmos-Predict1 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
NVIDIA Isaac GR00T N1.5 - A Foundation Model for Generalist Robots.
No fortress, purely open ground. OpenManus is Coming.
🦜🔗 Build context-aware reasoning applications
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
FlashMLA: Efficient Multi-head Latent Attention Kernels
HE-Drive: Human-Like End-to-End Driving with Vision Language Models
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
[ICCV 2025] Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
TuShare is a utility for crawling historical data of China stocks
[ECCV 2024 Oral] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
NeuroNCAP benchmark for end-to-end autonomous driving