We are building the embodied foundation model to capture and compress the world's most valuable data: the continuous, high-fidelity stream of physical interaction.
By creating a direct feedback loop between the model's decisions and the body's lived experience, we enable the emergence of a truly generalizable intelligence—one that understands not just how the world works, but how to act effectively within it.
This repository provides the training and inference code that supports our WALL series open-source embodied foundation models. It includes end-to-end pipelines for data preparation (LeRobot), model configuration, flow-matching and FAST action branches, and evaluation utilities for real and simulated robots.
- We introduce WALL-OSS, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision–language understanding, (2) strong language–action association, and (3) robust manipulation capability.
- WALL-OSS-FLOW: https://huggingface.co/x-square-robot/wall-oss-flow
- WALL-OSS-FAST: https://huggingface.co/x-square-robot/wall-oss-fast
Create and activate conda environment:
conda create --name wallx python=3.10
conda activate wallxInstall requirements:
pip install -r requirements.txt
MAX_JOBS=4 pip install flash-attn==2.7.4.post1 --no-build-isolationInstall lerobot:
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e .Install wall_x:
git submodule update --init --recursive
MAX_JOBS=4 pip install --no-build-isolation --verbose .Before training, please refer to workspace/README.md for detailed configuration instructions including:
Training script path configuration
- GPU setup
- Model and data paths
- Robot DOF configuration
- Training hyperparameters
Download the Flow/FAST pretrained model and run:
bash ./workspace/lerobot_example/run.shFor model inference, please refer to:
python ./scripts/fake_inference.pyThis script demonstrates how to:
- Load the Wall-OSS model using
Qwen2_5_VLMoEForAction.from_pretrained() - Prepare input data including proprioceptive information, attention masks, and dataset specifications
- Run inference in validation mode with proper data types (bfloat16)
- Validate model outputs and check for numerical stability
To generate an open-loop comparison plot, please follow:
python ./scripts/draw_openloop_plot.pyIf you find WALL-OSS models useful, please cite:
@misc{walloss_paper_2025,
title = {WALL-OSS: Igniting VLMs toward the Embodied Space},
author = {X Square Robot},
year = {2025},
howpublished = {\url{https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf}},
note = {White paper}
}