Commit 0eb50ec
authored
[rollout] fix: resolve agent loop config path in multi-node Ray training (volcengine#4029)
### What does this PR do?
Fixes agent loop configuration file path resolution in multi-node Ray
training environments.
**Problem:** When running multi-node training, relative paths to agent
loop config files fail on remote worker nodes with `FileNotFoundError`
because the working directory differs across nodes.
**Solution:** Updated `resolve_config_path()` to dynamically resolve
relative paths using the verl package installation location, making it
work universally regardless of execution directory.
Related issue: Agent loop config files cannot be loaded in multi-node
setups.
### Test
**Testing approach:**
- Tested with 2-node Ray cluster (4 GPUs total)
- Configuration: `recipe/langgraph_agent/example/agent.yaml`
- Results: Config file successfully resolved on all remote nodes
**Before fix:**
```shell
FileNotFoundError: [Errno 2] No such file or directory:
'/dfs/data/recipe/langgraph_agent/example/agent.yaml'
```
**After fix:**
```shell
[DEBUG] Found file at verl base path: /dfs/data/work/verl/recipe/langgraph_agent/example/agent.yaml
Training proceeds successfully
```
### API and Usage Example
**No API changes.** The fix is internal to the `resolve_config_path()`
helper function.
Users continue to use relative paths in config as before:
```yaml
rollout:
agent:
agent_loop_config_path: "recipe/langgraph_agent/example/agent.yaml"
```
The path resolution now works correctly across all nodes.
### Design & Code Changes
__File Changed:__ `verl/experimental/agent_loop/agent_loop.py`
__Function Modified:__ `resolve_config_path()`
__Key Changes:__
1. Removed hardcoded path fallbacks
2. Added dynamic path resolution using `verl.__file__` to locate project
root
3. Improved error messages with `FileNotFoundError`
__Resolution Strategy:__
1. If absolute path → return as-is
2. Try current working directory
3. Try relative to verl package installation (e.g.,
`/path/to/verl/recipe/...`)
4. Raise clear error if not found
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)1 parent 6d6ccb0 commit 0eb50ec
2 files changed
+62
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
281 | 282 | | |
282 | 283 | | |
283 | 284 | | |
284 | | - | |
| 285 | + | |
| 286 | + | |
285 | 287 | | |
286 | 288 | | |
287 | 289 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
15 | 74 | | |
16 | 75 | | |
17 | 76 | | |
| |||
0 commit comments