💪 RIPT-VLA

Interactive Post-Training for Vision-Language-Action Models
Official implementation of RIPT-VLA. Parts of the repo are built on a fork of QueST.

RIPT-VLA improves any pretrained VLA backbone (e.g., QueST, OpenVLA-OFT) using only sparse binary success rewards.
Through K-rollout interaction, dynamic sampling, and leave-one-out advantage estimation, we reach state-of-the-art success rates and successful in extremely low-data regimes.

📝 Paper | 🌐 Website

Authors: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp Krähenbühl
Contact: <[email protected]>

🔥 Highlights

Plug-and-Play Post-Training – fine-tune any VLA model with only task-success signals (no dense rewards, no value nets).
SOTA-performance – 94.3% success rate on LIBERO-90 with QueST + RIPT; 97.5 success rate on LIBERO Suites (Goal, Spatial, Object, Long) with OpenVLA-OFT + RIPT.
Low-data Regime – Extreme Low-Data Success – RIPT-VLA turns failure-prone models (e.g., 4% success with 1 demo) into performant agents (97+) using only sparse binary rewards and just 15 iterations.

📚 Table of Contents

📢 News

2025-05 Initial code release v0.1.0!

📌 TODO

1 · Checkpoints

QueST → RIPT-VLA (LIBERO-90, LIBERO-Suites).
OpenVLA-OFT → RIPT-VLA (LIBERO-Suites).

2 · Code

Repo cleanup for first release.
Meta-World Experiments.
Better modularization of RIPT.

🚀 Getting Started

💠 Installation

Follow the instructions in INSTALL.md for QueST and OpenVLA-OFT.

📦 Paths

Replace the following paths in config/paths.yaml:

paths:
  output_prefix: /path/to/experiment/output # Checkpoint and log output directory
  data_prefix: /path/to/libero/data # LIBERO data directory
  wandb_project: ript-vla # Your wandb project name

Quick Start: RIPT a SFT Model on LIBERO-90

This is an example of how to RIPT a SFT QueST model on LIBERO-90:

Install Quest and LIBERO following INSTALL.md
Download the pre-trained SFT model here (check the Model Zoo section for more checkpoints).

(optional) Evaluate the SFT model on LIBERO-90:
- Fill in the path for checkpoint_path with the SFT checkpoint path in the evaluation script in scripts/quest/eval/libero_90.sh
- Run the evaluation script with the number of GPUs you want to use:
```
bash scripts/quest/eval/libero_90.sh $NUM_GPU
```

Run the RIPT script with the number of GPUs you want to use:

Fill in the path for checkpoint_path with the SFT checkpoint path in the RIPT script in scripts/quest/stage_3_ript/libero_90.sh
Run the RIPT script with the number of GPUs you want to use:
```
bash scripts/quest/stage_3_ript/libero_90.sh $NUM_GPU
```

The script will:

Load the pre-trained SFT model
Run RIPT training on LIBERO-90
Log results to WandB
Use $NUM_GPU to specify the number of GPUs to use (Recommended at least 3 GPUs).

For complete details of the training process, see the Quest + RIPT Training section.

🤗 Model Zoo

All checkpoints are hosted at HuggingFace Model Hub: https://huggingface.co/tanshh97/RIPT_VLA/tree/main

QueST Checkpoints

We provide both SFT and RIPT checkpoints for QueST on multiple LIBERO suites:

Benchmark	Model	SFT Checkpoint	RIPT Checkpoint
LIBERO-90	QueST	Download	Download
LIBERO-GOAL	QueST	Download	Download
LIBERO-LONG	QueST	Download	Download
LIBERO-OBJECT	QueST	Download	Download
LIBERO-SPATIAL	QueST	Download	Download

All models are ~80MB.

OpenVLA-OFT Checkpoints

We provide both pre-trained and SFTed OpenVLA-OFT checkpoints for LIBERO-Suites:

Benchmark	Model	SFT Scale Head	RIPT LoRA Adaptor
LIBERO-GOAL	OpenVLA-OFT	Download	Download
LIBERO-LONG	OpenVLA-OFT	Download	Download
LIBERO-OBJECT	OpenVLA-OFT	Download	Download
LIBERO-SPATIAL	OpenVLA-OFT	Download	Download

Laplace Scale Head are around 300MB and RIPT LoRA Adaptor + Header are around 1GB.

🏋️ Quest RIPT Training

Activate the ript-vla conda environment and run the following commands:

Stage 1: (Optional) Pre-Training of QueST autoencoder

You can skip this stage if you are using SFT checkpoints from the Model Zoo.
Run scripts/quest/stage_1_autoencoder/libero_*.sh for different LIBERO suites.
This stage only trains the autoencoder of QueST that is used for SFT.
Only 1 GPU is needed.
Example for LIBERO-90:

bash scripts/quest/stage_1_autoencoder/libero_90.sh

Stage 2: (Optional) Supervised Fine-Tuning of QueST

You can skip this stage if you are using SFT checkpoints from the Model Zoo.
Run scripts/quest/stage_2_sft/libero_*.sh for different LIBERO suites.
This stage conducts supervised fine-tuning of QueST.
Only 1 GPU is needed.
Example for LIBERO-90:

bash scripts/quest/stage_2_sft/libero_90.sh

Stage 3: RIPT (Reinforcement Interactive Post-Training)

Check scripts/quest/stage_3_ript/libero_*.sh for different LIBERO suites.
Fill in the path for checkpoint_path with the SFT checkpoint path (either from Stage 2 or downloaded from the Model Zoo).
Use $NUM_GPU to specify the number of GPUs to use (Recommended 3 GPUs for LIBERO suites and 6 GPUs for LIBERO-90).
This stage conducts RIPT training of QueST.
Example for LIBERO-90:

bash scripts/quest/stage_3_ript/libero_90.sh $NUM_GPU

Key flags:

algo.rloo_batch_size: number of rollouts to use for RLOO K-sampling (default: 8)
algo.num_ppo_epochs: number of PPO epochs (default: 20)
algo.ppo_batch_size: number of PPO batches (default: 6 = 1 * 6 GPU)
train_dataloader.batch_size: batch size for training (default: 180 = 30 initializations * 6 GPU)
training.n_steps: number of training steps (default: 15)
training.rollout_steps: number of rollout interval steps (default: 3)
algo.enable_dynamic_sampling: enable dynamic sampling (default: true)

Evaluation

Run scripts/quest/eval/libero_*.sh for different LIBERO suites.
Fill in the path for checkpoint_path with the RIPT/SFT checkpoint path.
Example for LIBERO-90:

bash scripts/quest/eval/libero_90.sh $NUM_GPU

🏋️ OpenVLA-OFT RIPT Training

Activate the ript_vla_openvla_oft conda environment and run the following commands:

Stage 1 + Stage 2: Pre-Training + SFT of OpenVLA-OFT on LIBERO-Suites

We directly use the pre-trained and SFTed OpenVLA-OFT checkpoint from the OpenVLA-OFT repo.
Download the OpenVLA-OFT full model from the official OpenVLA-OFT model for each task suite.
Download the SFTed scale head from the Model Zoo for each task suite.

Stage 3: RIPT of OpenVLA-OFT on LIBERO-Suites

Check scripts/openvla_oft/stage_3_ript/libero_*.sh for different LIBERO suites.
Fill in the path for checkpoint_path with the SFT checkpoint path from the official OpenVLA-OFT repo.
Fill in the path for header_checkpoint with the SFTed scale head from the Model Zoo.
Fill in the path for lora_adaptor_ckpt with the RIPT LoRA Adaptor checkpoint:
- With the model from the Model Zoo if finetune from current RIPT model, or
- With null if finetune from SFT model.
Use $NUM_GPU to specify the number of GPUs to use (Recommended 4 GPUs for LIBERO suites).
This stage conducts RIPT training of OpenVLA-OFT.
Example for LIBERO-LONG:

bash scripts/openvla_oft/stage_3_ript/libero_long.sh $NUM_GPU

Evaluation

Run scripts/openvla_oft/eval/libero_*.sh for different LIBERO suites.
Fill in the path for checkpoint_path with the SFT checkpoint path from the official OpenVLA-OFT repo.
Fill in the path for header_checkpoint with the SFTed scale head from the Model Zoo.
Fill in the path for lora_adaptor_ckpt with the RIPT LoRA Adaptor checkpoint:
- With the RIPT model saved from Stage 3, or
- With the SFTed LoRA Adaptor checkpoint from the Model Zoo (if finetune from SFT model), or
- Set to null for evaluation from SFT model.
Example for LIBERO-GOAL:

bash scripts/openvla_oft/eval/libero_long.sh $NUM_GPU

🧪 Core RIPT Code Overview

1. Model Interface

ModelAdapter provides an adaptation layer that connects the VLA models to the RIPT optimizer. It handles:

Path: ript/algos/rl_optimizers/model_interface.py
Computing action log probabilities
Retrieving the policy model for optimization

2. Libero Runner with Context Caching

LiberoRunner_rl handles the interaction with the Libero environment with context caching:

Path: ript/env_runner/libero_runner.py: LiberoRunner_rl
Supports batch rollout generation
Manages task-specific context caching (e.g., context tokens, output action indices) for action log probability computation

3. Rollout Generator

RolloutGenerator handles the generation of rollouts for RL training with Dynamic Sampling:

Path: ript/algos/rl_optimizers/rollout_generator.py
Manages environment interactions
Gathers rollouts across environments
Supports early stopping to improve efficiency
Handles distributed rollout generation

4. RL Optimizer

RLOptimizer implements the Leave-on-out PPO (LOOP) algorithm:

Path: ript/algos/rl_optimizers/rl_optimizer.py
Processes generated rollouts
Computes rewards and advantages with Leave-One-Out Advantage Estimation
Applies PPO updates to the policy
Collects and returns optimization metrics

How to add a new VLA model?

Implement the ModelAdapter for the new model following the current ModelAdapter interface.
Implement the LiberoRunner_rl for the new model to cache model context for action log probability computation. For example, LiberoRunner_rl caches the context tokens and output action indices for QueST.
Add the new model and RIPT config following the existing quest_rl.yaml and openvla_oft_rl.yaml format.

📖 Citation

If you find this work useful, please cite:

@misc{tan2025interactiveposttrainingvisionlanguageactionmodels,
      title={Interactive Post-Training for Vision-Language-Action Models}, 
      author={Shuhan Tan and Kairan Dou and Yue Zhao and Philipp Krähenbühl},
      year={2025},
      eprint={2505.17016},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.17016}, 
}

🙏 Acknowledgement

RIPT-VLA builds on open-source efforts of:

QueST (Mete et al., 2024) - We forked the QueST repo and build our RIPT-VLA on top of it; we trained and finetuned our SFT QueST models with the original QueST codebase.
OpenVLAOFT (Kim et al., 2025) - We finetuned from the released OpenVLA-OFT checkpoint.
LIBERO Benchmark (Liu et al., 2023) - We used the LIBERO-90, LIBERO-Goal, LIBERO-Long, LIBERO-Object, and LIBERO-Spatial benchmarks.

We sincerely thank the authors of the above projects for their open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
image		image
ript		ript
scripts		scripts
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
eval_quest.py		eval_quest.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_quest_sft.py		train_quest_sft.py
train_ript.py		train_ript.py
train_ript_openvla_oft.py		train_ript_openvla_oft.py

Ariostgx/ript-vla

Folders and files

Latest commit

History

Repository files navigation

💪 RIPT-VLA

🔥 Highlights

📚 Table of Contents

📢 News

📌 TODO

1 · Checkpoints

2 · Code

🚀 Getting Started

💠 Installation

📦 Paths

Quick Start: RIPT a SFT Model on LIBERO-90

🤗 Model Zoo

QueST Checkpoints

OpenVLA-OFT Checkpoints

🏋️ Quest RIPT Training

Stage 1: (Optional) Pre-Training of QueST autoencoder

Stage 2: (Optional) Supervised Fine-Tuning of QueST

Stage 3: RIPT (Reinforcement Interactive Post-Training)

Evaluation

🏋️ OpenVLA-OFT RIPT Training

Stage 1 + Stage 2: Pre-Training + SFT of OpenVLA-OFT on LIBERO-Suites

Stage 3: RIPT of OpenVLA-OFT on LIBERO-Suites

Evaluation

🧪 Core RIPT Code Overview

1. Model Interface

2. Libero Runner with Context Caching

3. Rollout Generator

4. RL Optimizer

How to add a new VLA model?

📖 Citation

🙏 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages