ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

Shiyu Li, Yang Tang, Yifan Wang, Peiming Li, Xi Chen
Basic Algorithm Center, PCG, Tencent
Tsinghua Shenzhen International Graduate School, Tsinghua University

🔥 News

[2025.10.14] Released the initial codebase.
[2025.10.1] Released the dataset, leaderboard, model and paper.

🤗 Resources

Type	Links
Models	•ReSeek-qwen2.5-3b-em-grpo
Datasets	•FictionalHot
Leaderboard	•Search Agent Leaderboard

📌 Introduction

We propose ReSeek, a novel reinforcement learning framework that enables search agents to dynamically identify and recover from erroneous search paths during an episode through a self-correction mechanism.
Through a special JUDGE action, agents can evaluate retrieved information and re-plan their search strategy. We design a dense, instructive reward function that provides fine-grained feedback on both factual correctness and contextual utility.
We advocate for the Hot Benchmark evaluation principle and introduce FictionalHot as a contamination-resistant benchmark. Extensive experiments show that ReSeek significantly outperforms SOTA baselines in task success rate and path faithfulness.
ReSeek particularly excels in multi-hop reasoning scenarios, demonstrating robust self-correction capabilities in complex knowledge-intensive tasks.

🛠 Dependencies

Basic Installation

# Clone the repository
git clone https://github.com/TencentBAC/ReSeek.git
cd ReSeek

conda create -n ReSeek python=3.10
conda activate ReSeek

bash scripts/install_vllm_sglang_mcore.sh

# verl
pip install --no-deps -e .

Optional Dependencies

NPU (Ascend) Support:

# follow https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html to install vllm & vllm-ascend

pip install -r requirements-npu.txt
pip install -e .

📖 Quick Start

(1) Environment Variables

Before running training scripts, set the following environment variables:

# Set project root directory
export PROJECT_ROOT=/path/to/ReSeek

# Set model directory
export MODEL_DIR=/path/to/models

# Set data directory
export DATA_DIR=/path/to/datasets

(2) Data Preparation

Download the ReSeek training dataset:

# Preprocess dataset
python utils/preprocess_reseek_dataset.py \
  --hf_repo_id TencentBAC/ReSeek_train_test \
  --local_dir ${DATA_DIR}/processed_dateset

(3) Download Pre-trained Models

# Download base model (e.g., Qwen2.5-3B-Instruct)
huggingface-cli download --resume-download Qwen/Qwen2.5-3B-Instruct --local-dir Qwen2.5-3B-Instruct

# (Optional) Download ReSeek fine-tuned model
huggingface-cli download --resume-download TencentBAC/ReSeek-qwen2.5-3b-em-grpo --local-dir ReSeek-qwen2.5-3b-em-grpo

(4) Build Retrieval Index (optional)

Using Transformers:

cd search/retrieval
bash build_index.sh

Using vLLM:

cd search/retrieval
bash build_index_vllm.sh

(5) Launch Retrieval Service

cd search
bash retrieval_launch.sh

(6) Conduct RL Training

(optional) set the parameter trainer.device=npu on npu.

GRPO Training:

cd scripts

# 3B model
bash train_grpo.sh

# 7B model
bash train_grpo_7b.sh

PPO Training:

cd scripts

# 3B model
bash train_ppo.sh

# 7B model
bash train_ppo_7b.sh

💡 Performance

📊 Main Results

ReSeek achieves state-of-the-art performance across eight open-domain QA benchmarks:

Qwen2.5-7B: Average accuracy of 0.377, surpassing ZeroSearch's 0.346
Multi-hop Reasoning: Excels on complex multi-hop benchmarks like HotpotQA and Bamboogle
FictionalHot: Scores 0.061 on contamination-resistant stress test, while Direct Inference achieves only ~0.001

📊 Hot Benchmark

We propose the Hot Benchmark evaluation principle to address inconsistencies in experimental settings:

Test Sets: All 7 datasets (NQ, TriviaQA, PopQA, HotpotQA, 2Wiki, Musique, Bamboogle)
Training Set: Unified training set merging NQ and HotpotQA training splits
Corpus: 2018 Wikipedia corpus (wiki-18) for reproducible evaluation
Metrics: Exact Match (EM) as the primary metric for fair comparison
Retrieval: Top-k=3 with maximum T=4 tool-use turns per question
Embeddings: E5 embeddings for search backend
Models: Qwen2.5-3B/7B-Instruct as backbone models

📊 Self-Correction Case Study

ReSeek demonstrates robust self-correction through the JUDGE action:

After initial search, the JUDGE action correctly identifies insufficient information
Triggers a second targeted search
Successfully retrieves the correct answer

This dynamic correction mechanism enables ReSeek to excel in complex multi-hop reasoning scenarios.

🙏 Acknowledgements

This work is implemented based on Search-R1, veRL. We sincerely thank the authors of these projects for their valuable contributions to the open-source community.

📧 Contact

If you have any questions, feel free to reach out:

GitHub Issues: https://github.com/TencentBAC/ReSeek/issues

🚩 Citation

If this work is helpful, please kindly cite as:

@article{li2025reseek,
  title={ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards},
  author={Li, Shiyu and Tang, Yang and Wang, Yifan and Li, Peiming and Chen, Xi},
  journal={arXiv preprint arXiv:2510.00568},
  year={2025}
}

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

🔥 News

🤗 Resources

📌 Introduction

🛠 Dependencies

Basic Installation

Optional Dependencies

📖 Quick Start

(1) Environment Variables

(2) Data Preparation

(3) Download Pre-trained Models

(4) Build Retrieval Index (optional)

(5) Launch Retrieval Service

(6) Conduct RL Training

💡 Performance

📊 Main Results

📊 Hot Benchmark

📊 Self-Correction Case Study

🙏 Acknowledgements

📧 Contact

🚩 Citation

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
scripts		scripts
search		search
utils		utils
verl		verl
.gitignore		.gitignore
README.md		README.md
requirements-npu.txt		requirements-npu.txt

TencentBAC/ReSeek

Folders and files

Latest commit

History

Repository files navigation

ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

🔥 News

🤗 Resources

📌 Introduction

🛠 Dependencies

Basic Installation

Optional Dependencies

📖 Quick Start

(1) Environment Variables

(2) Data Preparation

(3) Download Pre-trained Models

(4) Build Retrieval Index (optional)

(5) Launch Retrieval Service

(6) Conduct RL Training

💡 Performance

📊 Main Results

📊 Hot Benchmark

📊 Self-Correction Case Study

🙏 Acknowledgements

📧 Contact

🚩 Citation

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages