This repository contains the codebase for the paper "Humanoid Policy ~ Human Policy".
It trains egocentric (i.e., without wrist camera) humanoid manipulation policies, with few wrappers to focus on the core components.
| - assets: robot URDFs and meshes
| - cet: Mujoco simulation for replaying/rollouting out policies for code development and adding new embodiments.
| - configs: configs for robots and simulation environments
| - data: placeholder for data with some visualization scripts
| - docs: documentations
| - hdt: main learning framework
| - human_data: script and interface for collecting human demonstration data
| - sim_test (legacy): legacy ALOHA cube transferring test as a dummy example for sanity check
- ACT (with options to use ResNet, DinoV2, and CLIP backbones)
- Vanilla DP (based on official colab)
- RDT (Trainer works, but not tested)
Clone the codebase.
cd ~
git clone --recursive https://github.com/RogerQi/human-policyFollow INSTALL.md to install required dependencies.
We open-source recordings on HuggingFace:
- Many humans performing tasks described in the paper in diverse in-the-wild scenes.
- Two Unitree H1 humanoid robots physically located in UCSD and CMU. Collected via teleoperation.
- One Unitree H1 humanoid robot in Mujoco. Collected via teleoperation.
To download the data, run
cd data/recordings
bash download_data.shWe provide scripts to examine the actions and visual observations in downloaded data.
cd data/
# You can take a look at the argparse inside the script to change data path for visualization
python plot_keypoints.py
python plot_visual_obs.pyHuman data is a scalable source for manipulation policy learning and we believe humanoid policies should make good use of it. To process your own human/humanoid data to our format, please refer to these files:
- data/plot_keypoints.py: 3D visualization of human-centric representations
- docs/humanoid_mujoco.md: replay processed data in Mujoco to make sure representatiosn are well-aligned
- hdt/constants.py: element-wise interpretation of human-centric representations in trainable formats
Let's take the toy humanoid manipulation data in Mujoco and simple ACT policy with ResNet as an example. Other data/model options are available in model configs and dataset configs.
To launch simple training on a single GPU (with at least 24GB VRAM), run
python main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json --model_cfg_path configs/models/act_resnet.yaml --no_wandbFor more sophisticated training, such as BF16, torch.compile, or multi-GPU training, the training script supports huggingface accelerator.
Start by configuring a config.yaml file.
accelerate config --config_file ./accelerator_setup.yamlThen
accelerate launch --config_file ./accelerator_setup.yaml main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json --model_cfg_path configs/models/act_resnet.yaml --no_wandbFor policies without complex architectures, such as ACT, we recommend using the val_and_jit_trace option to create traced models.
accelerate launch --config_file ./accelerator_setup.yaml main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json --model_cfg_path configs/models/act_resnet.yaml --no_wandb --val_and_jit_traceBy default, the main trainer uses accelerator.load_state to resume model/optimizer states from the latest checkpoint specified in (exptid).
The val_and_jit_trace flag then skips the training loop, and uses data format in the training data loader to create a traced model in (exptid)/policy_traced.pt.
Continuing from the previous example, after the policy is trained and traced, we can rollout the policy in Mujoco/real robot. The needed components are the dataset statistics and the traced policy weights. We include an example command below, for complete details please refer to docs/humanoid_mujoco.md.
cd ../cet
python mujoco_rollout_replay.py --hdf_file_path ../data/recordings/processed/1061new_sim_pepsi_grasp_h1_2_inspire-2025_02_11-22_20_48/processed_episode_0.hdf5 --norm_stats_path ../hdt/mujoco_sim_test_resnet_100cs/dataset_stats.pkl --plot --model_path ../hdt/mujoco_sim_test_resnet_100cs/policy_traced.pt --tasktype pepsi --chunk_size 100 --policy_config_path ../hdt/configs/models/act_resnet.yamlAfter setting up the ZED camera mount following our hardware documentation, you can start collecting human data.
First, initialize and update the opentv submodule:
git submodule update --init --recursiveThen follow the README to complete the environment setup.
Run the following command to start the data collection process:
cd ./human_data
python human_data.py --des task_name --description "description of the task"--des: A short name for the task (e.g.,pouring,cutting)--description: A more detailed description of the task
We use simple hand gestures to control the data collection flow:
- Record Gesture: Start and stop recording a demonstration.
- Drop Gesture: Cancel the current recording.
The following diagram shows the internal state transitions during the data collection process:
- Use the Record Gesture to enter and exit the
RECORDINGstate. - Use the Drop Gesture to cancel the current gesture and return to
WAITING.
Run the following command to start the data collection process:
cd ./cet
python post_process_zed.py --taskid task_name --multiprocess- Add teleoperation scripts to collect more Mujoco data
- Alleviate the known 'sticky finger' friction issue in Mujoco sim rollout
- Add example for forward kinematrics / retargeting for a new humanoid