- [2025.10.15] We've released a significant update to our paper, now available on arXiv! This new version includes more practical and rigorous experiments, showcasing the real-world capabilities of our DoctorAgent-RL model. We've conducted a thorough evaluation of the model on real patient diagnoses and validated its performance with expert feedback.
- [2025.6.16] We released the source code in GitHub and Models in Huggingface!
- [2025.5.26] We released our paper in arXiv!
Introducing DoctorAgent-RL: A multi-agent collaborative reinforcement learning framework revolutionizing clinical dialogue. By modeling medical consultations as dynamic decision-making processes under uncertainty, DoctorAgent-RL directly addresses the critical limitations of static clinical dialogue systems, enabling:
- Adaptive Information Gathering: Intelligent adjustment of dialogue paths based on patient responses.
- Clinical Reasoning Alignment: Autonomous development of interaction strategies consistent with medical logic.
- Overcoming Static Paradigms: Moving beyond superficial pattern imitation in existing dialogue datasets.
Through continuous multi-turn interactions between doctor and patient agents, optimized via reinforcement learning, DoctorAgent-RL achieves significant improvements in diagnostic accuracy and interaction efficiency.
- π§ Multi-Agent Collaboration: Doctor and patient agents with distinct roles and objectives.
- π Dynamic Strategy Optimization: Reinforcement learning-based policy updates for adaptive behavior.
- π― Comprehensive Reward Design: Multi-dimensional consultation evaluation metrics guiding optimal strategies.
- π Medical Knowledge Integration: Clinical reasoning logic embedded in decision-making processes.
- π MTMedDialog Dataset: The first English multi-turn medical consultation dataset designed with simulation capabilities.
Our framework consists of three core components that interact in a continuous learning loop:
- Doctor Agent: Responsible for diagnostic reasoning and formulating appropriate questions.
- Patient Agent: Simulates patient responses based on a given medical history and symptom progression.
- Consultation Evaluator: Provides comprehensive feedback to the agents through multi-dimensional reward signals, assessing the quality of the consultation.
The reinforcement learning process involves:
- Multi-turn dialogue simulation between the interacting doctor and patient agents.
- Dynamic reward calculation based on real-time consultation quality and objectives.
- Policy updates using advanced reinforcement learning algorithms, such as Group Relative Policy Optimization (GRPO).
- Continuous strategy refinement through iterative interactions, driving agents towards optimal diagnostic and communication strategies.
Our experiments demonstrate the effectiveness of DoctorAgent-RL across various metrics.
We selected Qwen2.5-7B-Instruct as the foundation for our Patient Agent in these experiments, evaluating its fidelity in simulating realistic patient behaviors.
The Doctor Agent's performance was rigorously evaluated for its diagnostic accuracy and efficiency in information gathering.
An ablation study was conducted to understand the contribution of each core component of DoctorAgent-RL to its overall performance.
We also investigated the framework's adaptability under varying turn budgets, highlighting its robust performance across different interaction lengths.
To set up your environment and run DoctorAgent-RL, follow these steps:
git clone https://github.com/JarvisUSTC/DoctorAgent-RL.git
cd DoctorAgent-RLFollow RAGEN's setup script:
bash scripts/setup_ragen.sh- Qwen2.5-7B-Instruct
- DoctorAgent-RL-SFT-1k-Thinking (Our SFT Model)
- DoctorAgent-RL (Our RL Model)
Once your environment is set up, you can run the experiments:
Our preprocessed training data is located in the data/ directory. For Supervised Fine-Tuning (SFT) cold start, we use the MTMedDialog_sft_train.parquet dataset. This dataset was created by prompting DeepSeek-V3 to generate the thinking process for each sample.
For Reinforcement Learning (RL) training, we utilize the MTMedDialog_RL.parquet dataset. This dataset includes detailed patient descriptions, which were generated by prompting Qwen2.5-7B-Instruct. Notably, Qwen2.5-7B-Instruct also serves as our patient agent in the RL setup.
# Example:
# Dynamic Turns + SFT Cold Start
bash scripts_exp/doctor-agent-rl-dynamic.sh
# Reward Model + Dynamic Turns + SFT Cold Start
bash scripts_exp/doctor-agent-rl-rm-dynamic.sh
# Reward Model + SFT Cold Start
bash scripts_exp/doctor-agent-rl-rm.sh
# Reward Model + Dynamic Turns
bash doctor-agent-rl-dynamic-wo-sft.shFor SFT cold start, you can use the "sft/finetune_lora_med.sh" or LLaMA-Factory.
The evaluation scripts are located in the ragen/env/medical_consultation/evaluation/ directory.
# Example:
bash ragen/env/medical_consultation/evaluation/run_eval_patientllm_category.sh ${MODEL_PATH}
# If you want to run with api, remember to config the api key and request command
bash ragen/env/medical_consultation/evaluation/run_eval_patientllm_category_api.sh ${MODEL_NAME}For more detailed command-line arguments and configuration options, please refer to the individual script files that will be released with the code.
If DoctorAgent-RL contributes to your research, please consider citing our work:
@article{feng2025doctoragent,
title={DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue},
author={Feng, Yichun and Wang, Jiawei and Zhou, Lu and Li, Yixue},
journal={arXiv preprint arXiv:2505.19630},
year={2025}
}