Commit 418f964
[BREAKING][algo] feat: Rollout Correction for General Off-Policy Problems (volcengine#3984)
## Summary
This PR introduces a comprehensive overhaul of the rollout correction
system with typed configuration, mathematical documentation, and
performance optimizations.
If you find the PR useful, please consider citing:
```bibtex
@misc{liu-li-2025,
title = {When Speed Kills Stability: Demystifying RL Collapse from the Inference-Training Mismatch},
url = {https://yingru.notion.site/When-Speed-Kills-Stability-Demystifying-RL-Collapse-from-the-Training-Inference-Mismatch-271211a558b7808d8b12d403fd15edda},
author = {Jiacai Liu and Yingru Li and Yuqian Fu and Jiawei Wang and Qian Liu and Yu Shen},
year = {2025},
month = september,
}
```
**1 parent 4bf4bd3 commit 418f964
File tree
28 files changed
+3659
-1919
lines changed- docs
- advance
- examples
- examples
- rollout_correction
- rollout_importance_sampling
- recipe
- dapo
- fully_async_policy
- shell
- one_step_off_policy
- tests/trainer/ppo
- verl
- trainer
- config
- algorithm
- ppo
- workers/actor
28 files changed
+3659
-1919
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
169 | | - | |
170 | | - | |
| 169 | + | |
| 170 | + | |
171 | 171 | | |
172 | | - | |
| 172 | + | |
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| |||
0 commit comments