-
Institute of Automation,Chinese Academy of Sciences
- Beijing
Highlights
- Pro
Stars
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
An All-in-one robot manipulation learning suite for policy models training and evaluation on various datasets and benchmarks.
A comprehensive list of papers about Robot Manipulation, including papers, codes, and related websites.
[TPAMI 2025] Code for "Diff9D: Diffusion-Based Domain-Generalized Category-Level 9DoF Object Pose Estimation".
This repository summarizes recent advances in the VLA + RL paradigm and provides a taxonomic classification of relevant works.
PyTorch code and models for VJEPA2 self-supervised learning from video.
[TASE 2025] Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
✨✨【NeurIPS 2025】Official implementation of BridgeVLA
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping [CVPR 2025]
This is the official code repo for GLOVER and GLOVER++.
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
[CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
🤖 RoboOS: A Universal Embodied Operating System for Cross-Embodied and Multi-Robot Collaboration
A powerful tool for creating fine-tuning datasets for LLM
[NeurIPS 2025]⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)