Stars
BEHAVIOR-1K: a platform for accelerating Embodied AI research. Join our Discord for support: https://discord.gg/bccR5vGFEx
PyTorch code and models for V-JEPA self-supervised learning from video.
Papers on Efficient Diffusion Models
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
[NeurIPS 2025]《SD-VLM: Spatial Measuring and Understanding with Depth-encoded Vision Language Models》
[CVPR 2024] Hierarchical Diffusion Policy for Multi-Task Robotic Manipulation
Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Fetching Embodied AI Paper from ArXiv automatically
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Zxy-MLlab / LIBERO-PRO
Forked from Lifelong-Robot-Learning/LIBEROLIBERO-PRO is the official repository of the LIBERO-PRO — an evaluation extension of the original LIBERO benchmark
A curated paper list and taxonomy of efficient Vision-Language-Action (VLA) models for embodied manipulation.
[ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
Code to train a compliant whole-body controller for the Unitree B1 + Z1
🔥This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the repository, so follow us to keep up with the latest developmen…
My learning notes/codes for ML SYS.
RynnVLA-002: A Unified Vision-Language-Action and World Model
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
F1: A Vision Language Action Model Bridging Understanding and Generation to Actions
ICCV 2025 | TesserAct: Learning 4D Embodied World Models