🐾
On vacation
Stars
rl
5 repositories
Solve Visual Understanding with Reinforced VLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
Minimal reproduction of DeepSeek R1-Zero
verl: Volcano Engine Reinforcement Learning for LLMs