EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

This project is a clean fork of the original veRL project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.

EasyR1 is efficient and scalable due to the design of HybirdEngine and the latest release of vLLM's SPMD mode.

Features

Supported models
- Llama3/Qwen2/Qwen2.5/Qwen3 language models
- Qwen2-VL/Qwen2.5-VL/Qwen3-VL vision language models
- DeepSeek-R1 distill models
Supported algorithms
- GRPO
- DAPO
- Reinforce++
- ReMax
- RLOO
- GSPO
- CISPO
Supported datasets
- Any text, vision-text dataset in a specific format
Supported tricks
- Padding-free training
- Resuming from the latest/best checkpoint
- Wandb & SwanLab & Mlflow & Tensorboard tracking

Requirements

Software Requirements

Python 3.9+
transformers>=4.54.0
flash-attn>=2.4.3
vllm>=0.8.3

We provide a Dockerfile to easily build environments.

We recommend using the pre-built docker image in EasyR1.

docker pull hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
docker run -it --ipc=host --gpus=all hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0

If your environment does not support Docker, you can consider using Apptainer:

apptainer pull easyr1.sif docker://hiyouga/verl:ngc-th2.8.0-cu12.9-vllm0.11.0
apptainer shell --nv --cleanenv --bind /mnt/your_dir:/mnt/your_dir easyr1.sif

Use USE_MODELSCOPE_HUB=1 to download models from the ModelScope hub.

Hardware Requirements

* estimated

Method	Bits	1.5B	3B	7B	32B	72B
GRPO Full Fine-Tuning	AMP	2*24GB	4*40GB	8*40GB	16*80GB	32*80GB
GRPO Full Fine-Tuning	BF16	1*24GB	1*40GB	4*40GB	8*80GB	16*80GB

Note

Use worker.actor.fsdp.torch_dtype=bf16 and worker.actor.optim.strategy=adamw_bf16 to enable bf16 training.

We are working hard to reduce the VRAM in RL training, LoRA support will be integrated in next updates.

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

git clone https://github.com/hiyouga/EasyR1.git
cd EasyR1
pip install -e .

GRPO Training

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

Merge Checkpoint in Hugging Face Format

python3 scripts/model_merger.py --local_dir checkpoints/easy_r1/exp_name/global_step_1/actor

Tip

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

If you want to use SwanLab logger, consider using bash examples/qwen2_5_vl_7b_geo3k_swanlab.sh.

Custom Dataset

Please refer to the example datasets to prepare your own dataset.

Text dataset: https://huggingface.co/datasets/hiyouga/math12k
Image-text dataset: https://huggingface.co/datasets/hiyouga/geometry3k
Multi-image-text dataset: https://huggingface.co/datasets/hiyouga/journeybench-multi-image-vqa
Text-image mixed dataset: https://huggingface.co/datasets/hiyouga/rl-mixed-dataset

How to Understand GRPO in EasyR1

To learn about the GRPO algorithm, you can refer to Hugging Face's blog.

How to Run 70B+ Model in Multi-node Environment

Start the Ray head node.

ray start --head --port=6379 --dashboard-host=0.0.0.0

Start the Ray worker node and connect to the head node.

ray start --address=<head_node_ip>:6379

Check the Ray resource pool.

ray status

Run training script on the Ray head node only.

bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

See the veRL's official doc for more details about multi-node training and Ray debugger.

Other Baselines

We also reproduced the following two baselines of the R1-V project.

CLEVR-70k-Counting: Train the Qwen2.5-VL-3B-Instruct model on counting problem.
GeoQA-8k: Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.

Performance Baselines

See baselines.md.

Awesome Work using EasyR1

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources.
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement.
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse.
Temporal-R1: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward.
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation.
GUI-R1: A Generalist R1-Style Vision-Language Action Model For GUI Agents.
R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.
MM-UPT: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO.
RL-with-Cold-Start: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start.
ViGoRL: Grounded Reinforcement Learning for Visual Reasoning.
Revisual-R1: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning.
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward.
Vision-Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning.
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use.
Long-RL: Scaling RL to Long Sequences.
EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation.
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping.
VPPO: Spotlight on Token Perception for Multimodal Reinforcement Learning.

TODO

Support LoRA (high priority).
Support ulysses parallelism for VLMs (middle priority).
Support more VLM architectures.

Note

We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using LLaMA-Factory.

Known bugs

These features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.

Vision language models are not compatible with ulysses parallelism yet.

Discussion Group

👋 Join our WeChat group.

FAQs

ValueError: Image features and image tokens do not match: tokens: 8192, features 9800

Increase the data.max_prompt_length or reduce the data.max_pixels.

RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62

Reduce the worker.rollout.gpu_memory_utilization and enable worker.actor.offload.offload_params.

RuntimeError: 0 active drivers ([]). There should only be one.

Uninstall deepspeed from the current python environment.

Citation

Core contributors: Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang and Yuwen Xiong

We also thank Guangming Sheng and Chi Zhang for helpful discussions.

@misc{zheng2025easyr1,
  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},
  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong},
  howpublished = {\url{https://github.com/hiyouga/EasyR1}},
  year         = {2025}
}

We recommend to also cite the original work.

@article{sheng2024hybridflow,
  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},
  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2409.19256}
}

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
assets		assets
examples		examples
scripts		scripts
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Dockerfile.legacy		Dockerfile.legacy
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

Features

Requirements

Software Requirements

Hardware Requirements

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

Custom Dataset

How to Understand GRPO in EasyR1

How to Run 70B+ Model in Multi-node Environment

Other Baselines

Performance Baselines

Awesome Work using EasyR1

TODO

Known bugs

Discussion Group

FAQs

Citation

About

Uh oh!

Releases 3

Uh oh!

Contributors 43

Languages

License

hiyouga/EasyR1

Folders and files

Latest commit

History

Repository files navigation

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework

Used by Amazon Web Services

Features

Requirements

Software Requirements

Hardware Requirements

Tutorial: Run Qwen2.5-VL GRPO on Geometry3K Dataset in Just 3 Steps

Installation

GRPO Training

Merge Checkpoint in Hugging Face Format

Custom Dataset

How to Understand GRPO in EasyR1

How to Run 70B+ Model in Multi-node Environment

Other Baselines

Performance Baselines

Awesome Work using EasyR1

TODO

Known bugs

Discussion Group

FAQs

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 43

Languages