Skip to content

NJU-RL/GLIDER

Repository files navigation

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

arXiv GitHub stars

Zican Hu12, Wei Liu3, Xiaoye Qu2, Xiangyu Yue4, Chunlin Chen1, Zhi Wang12, Yu Cheng4

1Nanjing University 2Shanghai AI Laboratory 3The Hong Kong University of Science and Technology 4The Chinese University of Hong Kong

Overview

GLIDER

Citation

If you find our paper useful, please consider to star this repository and cite it:

@inproceedings{hu2024divide,
      title={Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning},     
      author={Zican Hu and Wei Liu and  Xiaoye Qu and Xiangyu Yue and  Chuniln Chen and Zhi Wang and Yu Cheng},
      year={2025},
      booktitle={Proceedings of the 42st International Conference on Machine Learning}
}

Instructions

GLIDER tested on two benchmark tasks ScienceWorld and AlfWorld. Follow the instructions in the [ScienceWorld ][AlfWorld] to install. Create a virtual environment using conda, and see requirments.txt file for more information about how to install the dependencies.

conda create -n glider python=3.10 -y
conda activate glider
pip install -r requirements.txt

experiment

SFT

Run SFT training with the following script with corresponding config in ./config/glider_bc.json:

srun -p PARTITION_NAME \  # Specify your partition
     -w NODE_NAME \       # specify worker node
     -c NUM_CPUS \				# Specify CPU constraints
     deepspeed --num_gpus NUM_GPUS --master_port=PORT_NUMBER train_glider_bc.py

Or simply run the shell file:

sh glider_bc.sh

ORL

Set collection data mode in ./config/collection.json

srun -p PARTITION_NAME \
 		 -w NODE_NAME\
 		 -c NUM_CPUS \
     deepspeed --num_gpus 1 --master_port=PORT_NUMBER glider_data_collection.py

Then run ORL training with the following script with corresponding config in ./config/glider_awac.json:

sh glider_awac.sh 

O2O

Set task name and check point path in ./config/glider_o2o.json , then run O2O training :

srun -p PARTITION_NAME \  # Specify your partition
     -w NODE_NAME \       # specify worker node
     -c NUM_CPUS \				# Specify CPU constraints
     deepspeed --num_gpus NUM_GPUS --master_port=PORT_NUMBER train_glider_online.py

Evaluation

Set evaluation setting in ./config/eval.json , then run the shell file:

sh eval.sh

About

[ICML 2025] Official Implementation of GLIDER

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published