When LLM Agents Meet Reinforcement Learning

AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:

🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!
🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
🤗 Feel free to submit your own projects anytime - we welcome contributions!

Some Enumeration:

Enumeration for Reward Type:
- External Verifier: e.g., a compiler or math solver
- Rule-Based: e.g., a LaTeX parser with exact match scoring
- Model-Based: e.g., a trained verifier LLM or reward LLM
- Custom

🔧 Base Framework

Github Repo	Date	Org	Paper Link
siiRL	2025.7	Shanghai Innovation Institute	Paper
slime	2025.6	Tsinghua University (THUDM)	blog
agent-lightning	2025.6	Microsoft Research	Paper
AReaL	2025.6	AntGroup/Tsinghua	Paper
ROLL	2025.6	Alibaba	Paper
MARTI	2025.5	Tsinghua	--
RL2	2025.4	Accio	–
verifiers	2025.3	Individual	--
oat	2024.11	NUS/Sea AI	Paper
veRL	2024.10	ByteDance	Paper
OpenRLHF	2023.7	OpenRLHF	Paper
trl	2019.11	HuggingFace	--

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
siiRL	PPO/GRPO/CPGD/MARFT	Multi	Both	Multi	LLM/VLM/LLM-MAS PostTraining	Model/Rule	Planned
slime	GRPO/GSPO/REINFORCE++	Single	Both	Both	Math/Code	External Verifier	Yes
agent-lightning	PPO/Custom/Automatic Prompt Optimization	Multi	Outcome	Multi	Calculator/SQL	Model/External/Rule	Yes
AReaL	PPO	Both	Outcome	Both	Math/Code	External	Yes
ROLL	PPO/GRPO/Reinforce++/TOPR/RAFT++	Multi	Both	Multi	Math/QA/Code/Alignment	All	Yes
MARTI	PPO/GRPO/REINFORCE++/TTRL	Multi	Both	Multi	Math	All	Yes
RL2	Dr. GRPO/PPO/DPO	Single	Both	Both	QA/Dialogue	Rule/Model/External	Yes
verifiers	GRPO	Multi	Outcome	Both	Reasoning/Math/Code	All	Code
oat	PPO/GRPO	Single	Outcome	Multi	Math/Alignment	External	No
veRL	PPO/GRPO	Single	Outcome	Both	Math/QA/Reasoning/Search	All	Yes
OpenRLHF	PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO	Multi	Both	Both	Dialogue/Chat/Completion	Rule/Model/External	Yes
trl	PPO/GRPO/DPO	Single	Both	Single	QA	Custom	No

💪 General/MultiTask

Github Repo	Date	Org	Paper Link	RL Framework
AgentGym-RL	2025.9	Fudan University	Paper	veRL
Agent_Foundation_Models	2025.8	OPPO Personal AI Lab	Paper	veRL
SPA-RL-Agent	2025.5	PolyU	Paper	TRL
verl-agent	2025.5	NTU/Skywork	Paper	veRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
AgentGym-RL	PPO/GRPO/RLOO/REINFORCE++	Single	Outcome	Multi	Web/Search/Game/Embodied/Science	Rule/Model/External	Yes (Web, Search, Env APIs)
Agent_Foundation_Models	DAPO/PPO	Single	Outcome	Single	QA/Code/Math	Rule/External	Yes
SPA-RL-Agent	PPO	Single	Process	Multi	Navigation/Web/TextGame	Model	No
verl-agent	PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++	Multi	Both	Multi	Phone Use/Math/Code/Web/TextGame	All	Yes

🔍 Search/Research/Web

Github Repo	Date	Org	Paper Link	RL Framework
Tree-GRPO	2025.9	AMAP	Paper	veRL
ASearcher	2025.8	Ant Research RL Lab Tsinghua University & UW	Paper	RealHF/AReaL
Kimi-Researcher	2025.6	Moonshot AI	blog	Custom
TTI	2025.6	CMU	Paper	Custom
R-Search	2025.6	Individual	--	veRL
R1-Searcher-plus	2025.5	RUC	Paper	Custom
StepSearch	2025.5	SenseTime	Paper	veRL
AutoRefine	2025.5	USTC	Paper	veRL
ZeroSearch	2025.5	Alibaba	Paper	veRL
WebThinker	2025.4	RUC	Paper	Custom
DeepResearcher	2025.4	SJTU	Paper	veRL
Search-R1	2025.3	UIUC/Google	paper1, paper2	veRL
R1-Searcher	2025.3	RUC	Paper	OpenRLHF
C-3PO	2025.2	Alibaba	Paper	OpenRLHF
WebAgent	2025.1	Alibaba	paper1, paper2	LLaMA-Factory

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
Tree-GRPO	GRPO/Tree-GRPO	Single	Outcome	Multi	Search	Rule	Search
ASearcher	PPO/GRPO + Decoupled PPO	Single	Outcome	Multi	Math/Code/SearchQA	External/Rule	Yes
Kimi-Researcher	REINFORCE	Single	Outcome	Multi	Research	Outcome	Search, Browse, Coding
TTI	REINFORCE/BC	Single	Outcome	Multi	Web	External	Web Browsing
R-Search	PPO/GRPO	Single	Both	Multi	QA/Search	All	Yes
R1-Searcher-plus	Custom	Single	Outcome	Multi	Search	Model	Search
StepSearch	PPO	Single	Process	Multi	QA	Model	Search
AutoRefine	PPO/GRPO	Multi	Both	Multi	RAG QA	Rule	Search
ZeroSearch	PPO/GRPO/REINFORCE	Single	Outcome	Multi	QA/Search	Rule	Yes
WebThinker	DPO	Single	Outcome	Multi	Reasoning/QA/Research	Model/External	Web Browsing
DeepResearcher	PPO/GRPO	Multi	Outcome	Multi	Research	All	Yes
Search-R1	PPO/GRPO	Single	Outcome	Multi	Search	All	Search
R1-Searcher	PPO/DPO	Single	Both	Multi	Search	All	Yes
C-3PO	PPO	Multi	Outcome	Multi	Search	Model	Yes
WebAgent	DAPO	Multi	Process	Multi	Web	Model	Yes

📱 GUI

Github Repo	Date	Org	Paper Link	RL Framework
MobileAgent	2025.9	X-PLUG (TongyiQwen)	paper	veRL
InfiGUI-G1	2025.8	InfiX AI	Paper	veRL
Grounding-R1	2025.6	Salesforce	blog	trl
AgentCPM-GUI	2025.6	OpenBMB/Tsinghua/RUC	Paper	Huggingface
ARPO	2025.5	CUHK/HKUST	Paper	veRL
GUI-G1	2025.5	RUC	Paper	TRL
GUI-R1	2025.4	CAS/NUS	Paper	veRL
UI-R1	2025.3	vivo/CUHK	Paper	TRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
MobileAgent	semi-online RL	Single	Both	Multi	MobileGUI/Automation	Rule	Yes
InfiGUI-G1	AEPO	Single	Outcome	Single	GUI/Grounding	Rule	No
Grounding-R1	GRPO	Single	Outcome	Multi	GUI Grounding	Model	Yes
AgentCPM-GUI	GRPO	Single	Outcome	Multi	Mobile GUI	Model	Yes
ARPO	GRPO	Single	Outcome	Multi	GUI	External	Computer Use
GUI-G1	GRPO	Single	Outcome	Single	GUI	Rule/External	No
GUI-R1	GRPO	Single	Outcome	Multi	GUI	Rule	No
UI-R1	GRPO	Single	Process	Both	GUI	Rule	Computer/Phone Use

🔨 Tool

Github Repo	Date	Org	Paper Link	RL Framework
MiroRL	2025.8	MiroMindAI	HF Repo	veRL
verl-tool	2025.6	TIGER-Lab	X	veRL
Multi-Turn-RL-Agent	2025.5	University of Minnesota	Paper	Custom
Tool-N1	2025.5	NVIDIA	Paper	veRL
Tool-Star	2025.5	RUC	Paper	LLaMA-Factory
RL-Factory	2025.5	Simple-Efficient	model	veRL
ReTool	2025.4	ByteDance	Paper	veRL
AWorld	2025.3	Ant Group (inclusionAI)	Paper	veRL
Agent-R1	2025.3	USTC	--	veRL
ReCall	2025.3	BaiChuan	Paper	veRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
MiroRL	GRPO	Single	Both	Multi	Reasoning/Planning/ToolUse	Rule-based	MCP
verl-tool	PPO/GRPO	Single	Both	Both	Math/Code	Rule/External	Yes
Multi-Turn-RL-Agent	GRPO	Single	Both	Multi	Tool-use/Math	Rule/External	Yes
Tool-N1	PPO	Single	Outcome	Multi	Math/Dialogue	All	Yes
Tool-Star	PPO/DPO/ORPO/SimPO/KTO	Single	Outcome	Multi	Multi-modal/Tool Use/Dialogue	Model/External	Yes
RL-Factory	GRPO	Multi	Both	Multi	Tool-use/NL2SQL	All	MCP
ReTool	PPO	Single	Outcome	Multi	Math	External	Code
AWorld	GRPO	Both	Outcome	Multi	Search/Web/Code	External/Rule	Yes
Agent-R1	PPO/GRPO	Single	Both	Multi	Tool-use/QA	Model	Yes
ReCall	PPO/GRPO/RLOO/REINFORCE++/ReMax	Single	Outcome	Multi	Tool-use/Math/QA	All	Yes

🎮 TextGame

Github Repo	Date	Org	Paper Link	RL Framework
ARIA	2025.6	Fudan University	Paper	Custom
AMPO	2025.5	Tongyi Lab, Alibaba	Paper	veRL
Trinity-RFT	2025.5	Alibaba	Paper	veRL
VAGEN	2025.3	RAGEN-AI	Paper	veRL
ART	2025.3	OpenPipe	Paper	TRL
OpenManus-RL	2025.3	UIUC/MetaGPT	--	Custom
RAGEN	2025.1	RAGEN-AI	Paper	veRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
ARIA	REINFORCE	Both	Process	Multi	Negotiation/Bargaining	Other	No
AMPO	BC/AMPO(GRPO improvement)	Multi	Outcome	Multi	Social Interaction	Model-based	No
Trinity-RFT	PPO/GRPO	Single	Outcome	Both	Math/TextGame/Web	All	Yes
VAGEN	PPO/GRPO	Single	Both	Multi	TextGame/Navigation	All	Yes
ART	GRPO	Multi	Both	Multi	TextGame	All	Yes
OpenManus-RL	PPO/DPO/GRPO	Multi	Outcome	Multi	TextGame	All	Yes
RAGEN	PPO/GRPO	Single	Both	Multi	TextGame	All	Yes

💻 Code

Github Repo	Date	Org	Paper Link	RL Framework
RepoDeepSearch	2025.8	PKU, Bytedance, BIT	Paper	veRL
MedAgentGym	2025.6	Emory/Georgia Tech	Paper	Hugginface
CURE	2025.6	University of Chicago Princeton/ByteDance	Paper	Huggingface
MASLab	2025.5	MASWorks	Paper	Custom
Time-R1	2025.5	UIUC	Paper	veRL
ML-Agent	2025.5	MASWorks	Paper	Custom
SkyRL	2025.4	NovaSky	--	veRL
digitalhuman	2025.4	Tencent	Paper	veRL
sweet_rl	2025.3	Meta/UCB	Paper	OpenRLHF
rllm	2025.1	Berkeley Sky Computing Lab BAIR / Together AI	Notion Blog	veRL
open-r1	2025.1	HuggingFace	--	TRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
RepoDeepSearch	GRPO	Single	Both	Multi	Search/Repair	Rule/External	Yes
MedAgentGym	SFT/DPO/PPO/GRPO	Single	Outcome	Multi	Medical/Code	External	Yes
CURE	PPO	Single	Outcome	Single	Code	External	No
MASLab	NO RL	Multi	Outcome	Multi	Code/Math/Reasoning	External	Yes
Time-R1	PPO/GRPO/DPO	Multi	Outcome	Multi	Temporal	All	Code
ML-Agent	Custom	Single	Process	Multi	Code	All	Yes
SkyRL	PPO/GRPO	Single	Outcome	Multi	Math/Code	All	Code
digitalhuman	PPO/GRPO/ReMax/RLOO	Multi	Outcome	Multi	Empathy/Math/Code/MultimodalQA	Rule/Model/External	Yes
sweet_rl	DPO	Multi	Process	Multi	Design/Code	Model	Web Browsing
rllm	PPO/GRPO	Single	Outcome	Multi	Code Edit	External	Yes
open-r1	GRPO	Single	Outcome	Single	Math/Code	All	Yes

🤔 QA(Reasoning/Math)

Github Repo	Date	Org	Paper Link	RL Framework
AgentFlow	2025.09	Stanford University	arXiv	veRL
ARPO	2025.7	RUC, Kuaishou	Paper	veRL
terminal-bench-rl	2025.7	Individual (Danau5tin)	N/A	rLLM
MOTIF	2025.6	University of Maryland	Paper	trl
cmriat/l0	2025.6	CMRIAT	Paper	veRL
agent-distillation	2025.5	KAIST	Paper	Custom
VDeepEyes	2025.5	Xiaohongshu/XJTU	Paper	veRL
EasyR1	2025.4	Individual	repo1/paper2	veRL
AutoCoA	2025.3	BJTU	Paper	veRL
ToRL	2025.3	SJTU	Paper	veRL
ReMA	2025.3	SJTU, UCL	Paper	veRL
Agentic-Reasoning	2025.2	Oxford	Paper	Custom
SimpleTIR	2025.2	NTU, Bytedance	Notion Blog	veRL
openrlhf_async_pipline	2024.5	OpenRLHF	Paper	OpenRLHF

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
AgentFlow	Flow-GRPO	Single	Outcome	Multi	Search/Math/QA	Model/External	Yes
ARPO	GRPO	Single	Outcome	Multi	Math/Coding	Model/Rule	Yes
terminal-bench-rl	GRPO	Single	Outcome	Multi	Coding/Terminal	Model+External Verifier	Yes
MOTIF	GRPO	Single	Outcome	Multi	QA	Rule	No
cmriat/l0	PPO	Multi	Process	Multi	QA	All	Yes
agent-distillation	PPO	Single	Process	Multi	QA/Math	External	Yes
VDeepEyes	PPO/GRPO	Multi	Process	Multi	VQA	All	Yes
EasyR1	GRPO	Single	Process	Multi	Vision-Language	Model	Yes
AutoCoA	GRPO	Multi	Outcome	Multi	Reasoning/Math/QA	All	Yes
ToRL	GRPO	Single	Outcome	Single	Math	Rule/External	Yes
ReMA	PPO	Multi	Outcome	Multi	Math	Rule	No
Agentic-Reasoning	Custom	Single	Process	Multi	QA/Math	External	Web Browsing
SimpleTIR	PPO/GRPO (with extensions)	Single	Outcome	Multi	Math, Coding	All	Yes
openrlhf_async_pipline	PPO/REINFORCE++/DPO/RLOO	Single	Outcome	Multi	Dialogue/Reasoning/QA	All	No

🧠 Memory

Github Repo	Date	Org	Paper Link	RL Framework
MEM1	2025.7	MIT	Paper	veRL (based on Search-R1)
Memento	2025.6	UCL, Huawei	Paper	Custom
MemAgent	2025.6	Bytedance, Tsinghua-SIA	Paper	veRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
MEM1	PPO/GRPO	Single	Outcome	Multi	WebShop/GSM8K/QA	Rule/Model	Yes
Memento	soft Q-Learning	Single	Outcome	Multi	Research/QA/Code/Web	External/Rule	Yes
MemAgent	PPO, GRPO, DPO	Multi	Outcome	Multi	Long-context QA	Rule/Model/External	Yes

🦾 Embodied

Github Repo	🌟 Stars	Date	Org	Paper Link	RL Framework
Embodied-R1		2025.6	Tianjing University	Paper	veRL

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
Embodied-R1	GRPO	Single	Outcome	Single	Grounding/Waypoint	Rule	No

🏥 Biomedical

Github Repo	Date	Org	Paper Link	RL Framework
MMedAgent-RL	2025.8	Unknown	paper	Unknown
DoctorAgent-RL	2025.5	UCAS/CAS/USTC	Paper	RAGEN
Biomni	2025.3	Stanford University (SNAP)	Paper	Custom

📋 Click to view technical details

Github Repo	RL Algorithm	Single/Multi Agent	Outcome/Process Reward	Single/Multi Turn	Task	Reward Type	Tool usage
MMedAgent-RL	Unknown	Multi	Unknown	Unknown	Unknown	Unknown	Unknown
DoctorAgent-RL	GRPO	Multi	Both	Multi	Consultation/Diagnosis	Model/Rule	No
Biomni	TBD	Single	TBD	Single	scRNAseq/CRISPR/ADMET/Knowledge	TBD	Yes

⛰️ Environment

Github Repo	Date	Org	Task
CompassVerifier	2025.7	Shanghai AI Lab	Knowledge/Math/Science/GeneralReasoning
Mind2Web-2	2025.6	Ohio State University	Web
gem	2025.5	Sea AI Lab	Math/Code/Game/QA
MLE-Dojo	2025.5	GIT, Stanford	MLE
atropos	2025.4	Nous Research	Game/Code/Tool
InternBootcamp	2025.4	InternBootcamp	Coding/QA/Game
loong	2025.3	CAMEL-AI.org	RLVR
reasoning-gym	2025.1	open-thought	Math/Game
llmgym	2025.1	tensorzero	TextGame/Tool
debug-gym	2024.11	Microsoft Research	Debugging/Game/Code
gym-llm	2024.8	Rodrigo Sánchez Molina	Control/Game
AgentGym	2024.6	Fudan	Web/Game
tau-bench	2024.6	Sierra	Tool
appworld	2024.6	Stony Brook University	Phone Use
android_world	2024.5	Google Research	Phone Use
TheAgentCompany	2024.3	CMU, Duke	Coding
LlamaGym	2024.3	Rohan Pandey	Game
visualwebarena	2024.1	CMU	Web
LMRL-Gym	2023.12	UC Berkeley	Game
OSWorld	2023.10	HKU, CMU, Salesforce, Waterloo	Computer Use
webarena	2023.7	CMU	Web
AgentBench	2023.7	Tsinghua University	Game/Web/QA/Tool
WebShop	2022.7	Princeton-NLP	Web
ScienceWorld	2022.3	AllenAI	TextGame/ScienceQA
alfworld	2020.10	Microsoft, CMU, UW	Embodied
factorio-learning-environment	2021.6	JackHopkins	Game
jericho	2018.10	Microsoft, GIT	TextGame
TextWorld	2018.6	Microsoft Research	TextGame

Under Review/Waiting for Open Source

Star History

Citation

If you find this repository useful, please consider citing it:

@misc{agentsMeetRL,
  title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
  author={AgentsMeetRL Contributors},
  year={2025},
  url={https://github.com/thinkwee/agentsMeetRL}
}

Made with ❤️ by the AgentsMeetRL community

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
README.md		README.md
index.html		index.html
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

When LLM Agents Meet Reinforcement Learning

🔧 Base Framework

💪 General/MultiTask

🔍 Search/Research/Web

📱 GUI

🔨 Tool

🎮 TextGame

💻 Code

🤔 QA(Reasoning/Math)

🧠 Memory

🦾 Embodied

🏥 Biomedical

⛰️ Environment

Under Review/Waiting for Open Source

Star History

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

thinkwee/AgentsMeetRL

Folders and files

Latest commit

History

Repository files navigation

When LLM Agents Meet Reinforcement Learning

🔧 Base Framework

💪 General/MultiTask

🔍 Search/Research/Web

📱 GUI

🔨 Tool

🎮 TextGame

💻 Code

🤔 QA(Reasoning/Math)

🧠 Memory

🦾 Embodied

🏥 Biomedical

⛰️ Environment

Under Review/Waiting for Open Source

Star History

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages