AgentsMeetRL is an awesome list that summarizes open-source repositories for training LLM Agents using reinforcement learning:
- 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo).
⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them!- 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table.
- 🤗 Feel free to submit your own projects anytime - we welcome contributions!
Some Enumeration:
- Enumeration for Reward Type:
- External Verifier: e.g., a compiler or math solver
- Rule-Based: e.g., a LaTeX parser with exact match scoring
- Model-Based: e.g., a trained verifier LLM or reward LLM
- Custom
| Github Repo | 🌟 Stars | Date | Org | Paper Link |
|---|---|---|---|---|
| siiRL | 2025.7 | Shanghai Innovation Institute | Paper | |
| slime | 2025.6 | Tsinghua University (THUDM) | blog | |
| agent-lightning | 2025.6 | Microsoft Research | Paper | |
| AReaL | 2025.6 | AntGroup/Tsinghua | Paper | |
| ROLL | 2025.6 | Alibaba | Paper | |
| MARTI | 2025.5 | Tsinghua | -- | |
| RL2 | 2025.4 | Accio | – | |
| verifiers | 2025.3 | Individual | -- | |
| oat | 2024.11 | NUS/Sea AI | Paper | |
| veRL | 2024.10 | ByteDance | Paper | |
| OpenRLHF | 2023.7 | OpenRLHF | Paper | |
| trl | 2019.11 | HuggingFace | -- |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| siiRL | PPO/GRPO/CPGD/MARFT | Multi | Both | Multi | LLM/VLM/LLM-MAS PostTraining | Model/Rule | Planned |
| slime | GRPO/GSPO/REINFORCE++ | Single | Both | Both | Math/Code | External Verifier | Yes |
| agent-lightning | PPO/Custom/Automatic Prompt Optimization | Multi | Outcome | Multi | Calculator/SQL | Model/External/Rule | Yes |
| AReaL | PPO | Both | Outcome | Both | Math/Code | External | Yes |
| ROLL | PPO/GRPO/Reinforce++/TOPR/RAFT++ | Multi | Both | Multi | Math/QA/Code/Alignment | All | Yes |
| MARTI | PPO/GRPO/REINFORCE++/TTRL | Multi | Both | Multi | Math | All | Yes |
| RL2 | Dr. GRPO/PPO/DPO | Single | Both | Both | QA/Dialogue | Rule/Model/External | Yes |
| verifiers | GRPO | Multi | Outcome | Both | Reasoning/Math/Code | All | Code |
| oat | PPO/GRPO | Single | Outcome | Multi | Math/Alignment | External | No |
| veRL | PPO/GRPO | Single | Outcome | Both | Math/QA/Reasoning/Search | All | Yes |
| OpenRLHF | PPO/REINFORCE++/GRPO/DPO/IPO/KTO/RLOO | Multi | Both | Both | Dialogue/Chat/Completion | Rule/Model/External | Yes |
| trl | PPO/GRPO/DPO | Single | Both | Single | QA | Custom | No |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| AgentGym-RL | 2025.9 | Fudan University | Paper | veRL | |
| Agent_Foundation_Models | 2025.8 | OPPO Personal AI Lab | Paper | veRL | |
| SPA-RL-Agent | 2025.5 | PolyU | Paper | TRL | |
| verl-agent | 2025.5 | NTU/Skywork | Paper | veRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| AgentGym-RL | PPO/GRPO/RLOO/REINFORCE++ | Single | Outcome | Multi | Web/Search/Game/Embodied/Science | Rule/Model/External | Yes (Web, Search, Env APIs) |
| Agent_Foundation_Models | DAPO/PPO | Single | Outcome | Single | QA/Code/Math | Rule/External | Yes |
| SPA-RL-Agent | PPO | Single | Process | Multi | Navigation/Web/TextGame | Model | No |
| verl-agent | PPO/GRPO/GiGPO/DAPO/RLOO/REINFORCE++ | Multi | Both | Multi | Phone Use/Math/Code/Web/TextGame | All | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| Tree-GRPO | 2025.9 | AMAP | Paper | veRL | |
| ASearcher | 2025.8 | Ant Research RL Lab Tsinghua University & UW |
Paper | RealHF/AReaL | |
| Kimi-Researcher | 2025.6 | Moonshot AI | blog | Custom | |
| TTI | 2025.6 | CMU | Paper | Custom | |
| R-Search | 2025.6 | Individual | -- | veRL | |
| R1-Searcher-plus | 2025.5 | RUC | Paper | Custom | |
| StepSearch | 2025.5 | SenseTime | Paper | veRL | |
| AutoRefine | 2025.5 | USTC | Paper | veRL | |
| ZeroSearch | 2025.5 | Alibaba | Paper | veRL | |
| WebThinker | 2025.4 | RUC | Paper | Custom | |
| DeepResearcher | 2025.4 | SJTU | Paper | veRL | |
| Search-R1 | 2025.3 | UIUC/Google | paper1, paper2 | veRL | |
| R1-Searcher | 2025.3 | RUC | Paper | OpenRLHF | |
| C-3PO | 2025.2 | Alibaba | Paper | OpenRLHF | |
| WebAgent | 2025.1 | Alibaba | paper1, paper2 | LLaMA-Factory |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| Tree-GRPO | GRPO/Tree-GRPO | Single | Outcome | Multi | Search | Rule | Search |
| ASearcher | PPO/GRPO + Decoupled PPO | Single | Outcome | Multi | Math/Code/SearchQA | External/Rule | Yes |
| Kimi-Researcher | REINFORCE | Single | Outcome | Multi | Research | Outcome | Search, Browse, Coding |
| TTI | REINFORCE/BC | Single | Outcome | Multi | Web | External | Web Browsing |
| R-Search | PPO/GRPO | Single | Both | Multi | QA/Search | All | Yes |
| R1-Searcher-plus | Custom | Single | Outcome | Multi | Search | Model | Search |
| StepSearch | PPO | Single | Process | Multi | QA | Model | Search |
| AutoRefine | PPO/GRPO | Multi | Both | Multi | RAG QA | Rule | Search |
| ZeroSearch | PPO/GRPO/REINFORCE | Single | Outcome | Multi | QA/Search | Rule | Yes |
| WebThinker | DPO | Single | Outcome | Multi | Reasoning/QA/Research | Model/External | Web Browsing |
| DeepResearcher | PPO/GRPO | Multi | Outcome | Multi | Research | All | Yes |
| Search-R1 | PPO/GRPO | Single | Outcome | Multi | Search | All | Search |
| R1-Searcher | PPO/DPO | Single | Both | Multi | Search | All | Yes |
| C-3PO | PPO | Multi | Outcome | Multi | Search | Model | Yes |
| WebAgent | DAPO | Multi | Process | Multi | Web | Model | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| MobileAgent | 2025.9 | X-PLUG (TongyiQwen) | paper | veRL | |
| InfiGUI-G1 | 2025.8 | InfiX AI | Paper | veRL | |
| Grounding-R1 | 2025.6 | Salesforce | blog | trl | |
| AgentCPM-GUI | 2025.6 | OpenBMB/Tsinghua/RUC | Paper | Huggingface | |
| ARPO | 2025.5 | CUHK/HKUST | Paper | veRL | |
| GUI-G1 | 2025.5 | RUC | Paper | TRL | |
| GUI-R1 | 2025.4 | CAS/NUS | Paper | veRL | |
| UI-R1 | 2025.3 | vivo/CUHK | Paper | TRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| MobileAgent | semi-online RL | Single | Both | Multi | MobileGUI/Automation | Rule | Yes |
| InfiGUI-G1 | AEPO | Single | Outcome | Single | GUI/Grounding | Rule | No |
| Grounding-R1 | GRPO | Single | Outcome | Multi | GUI Grounding | Model | Yes |
| AgentCPM-GUI | GRPO | Single | Outcome | Multi | Mobile GUI | Model | Yes |
| ARPO | GRPO | Single | Outcome | Multi | GUI | External | Computer Use |
| GUI-G1 | GRPO | Single | Outcome | Single | GUI | Rule/External | No |
| GUI-R1 | GRPO | Single | Outcome | Multi | GUI | Rule | No |
| UI-R1 | GRPO | Single | Process | Both | GUI | Rule | Computer/Phone Use |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| MiroRL | 2025.8 | MiroMindAI | HF Repo | veRL | |
| verl-tool | 2025.6 | TIGER-Lab | X | veRL | |
| Multi-Turn-RL-Agent | 2025.5 | University of Minnesota | Paper | Custom | |
| Tool-N1 | 2025.5 | NVIDIA | Paper | veRL | |
| Tool-Star | 2025.5 | RUC | Paper | LLaMA-Factory | |
| RL-Factory | 2025.5 | Simple-Efficient | model | veRL | |
| ReTool | 2025.4 | ByteDance | Paper | veRL | |
| AWorld | 2025.3 | Ant Group (inclusionAI) | Paper | veRL | |
| Agent-R1 | 2025.3 | USTC | -- | veRL | |
| ReCall | 2025.3 | BaiChuan | Paper | veRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| MiroRL | GRPO | Single | Both | Multi | Reasoning/Planning/ToolUse | Rule-based | MCP |
| verl-tool | PPO/GRPO | Single | Both | Both | Math/Code | Rule/External | Yes |
| Multi-Turn-RL-Agent | GRPO | Single | Both | Multi | Tool-use/Math | Rule/External | Yes |
| Tool-N1 | PPO | Single | Outcome | Multi | Math/Dialogue | All | Yes |
| Tool-Star | PPO/DPO/ORPO/SimPO/KTO | Single | Outcome | Multi | Multi-modal/Tool Use/Dialogue | Model/External | Yes |
| RL-Factory | GRPO | Multi | Both | Multi | Tool-use/NL2SQL | All | MCP |
| ReTool | PPO | Single | Outcome | Multi | Math | External | Code |
| AWorld | GRPO | Both | Outcome | Multi | Search/Web/Code | External/Rule | Yes |
| Agent-R1 | PPO/GRPO | Single | Both | Multi | Tool-use/QA | Model | Yes |
| ReCall | PPO/GRPO/RLOO/REINFORCE++/ReMax | Single | Outcome | Multi | Tool-use/Math/QA | All | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| ARIA | 2025.6 | Fudan University | Paper | Custom | |
| AMPO | 2025.5 | Tongyi Lab, Alibaba | Paper | veRL | |
| Trinity-RFT | 2025.5 | Alibaba | Paper | veRL | |
| VAGEN | 2025.3 | RAGEN-AI | Paper | veRL | |
| ART | 2025.3 | OpenPipe | Paper | TRL | |
| OpenManus-RL | 2025.3 | UIUC/MetaGPT | -- | Custom | |
| RAGEN | 2025.1 | RAGEN-AI | Paper | veRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| ARIA | REINFORCE | Both | Process | Multi | Negotiation/Bargaining | Other | No |
| AMPO | BC/AMPO(GRPO improvement) | Multi | Outcome | Multi | Social Interaction | Model-based | No |
| Trinity-RFT | PPO/GRPO | Single | Outcome | Both | Math/TextGame/Web | All | Yes |
| VAGEN | PPO/GRPO | Single | Both | Multi | TextGame/Navigation | All | Yes |
| ART | GRPO | Multi | Both | Multi | TextGame | All | Yes |
| OpenManus-RL | PPO/DPO/GRPO | Multi | Outcome | Multi | TextGame | All | Yes |
| RAGEN | PPO/GRPO | Single | Both | Multi | TextGame | All | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| RepoDeepSearch | 2025.8 | PKU, Bytedance, BIT | Paper | veRL | |
| MedAgentGym | 2025.6 | Emory/Georgia Tech | Paper | Hugginface | |
| CURE | 2025.6 | University of Chicago Princeton/ByteDance |
Paper | Huggingface | |
| MASLab | 2025.5 | MASWorks | Paper | Custom | |
| Time-R1 | 2025.5 | UIUC | Paper | veRL | |
| ML-Agent | 2025.5 | MASWorks | Paper | Custom | |
| SkyRL | 2025.4 | NovaSky | -- | veRL | |
| digitalhuman | 2025.4 | Tencent | Paper | veRL | |
| sweet_rl | 2025.3 | Meta/UCB | Paper | OpenRLHF | |
| rllm | 2025.1 | Berkeley Sky Computing Lab BAIR / Together AI |
Notion Blog | veRL | |
| open-r1 | 2025.1 | HuggingFace | -- | TRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| RepoDeepSearch | GRPO | Single | Both | Multi | Search/Repair | Rule/External | Yes |
| MedAgentGym | SFT/DPO/PPO/GRPO | Single | Outcome | Multi | Medical/Code | External | Yes |
| CURE | PPO | Single | Outcome | Single | Code | External | No |
| MASLab | NO RL | Multi | Outcome | Multi | Code/Math/Reasoning | External | Yes |
| Time-R1 | PPO/GRPO/DPO | Multi | Outcome | Multi | Temporal | All | Code |
| ML-Agent | Custom | Single | Process | Multi | Code | All | Yes |
| SkyRL | PPO/GRPO | Single | Outcome | Multi | Math/Code | All | Code |
| digitalhuman | PPO/GRPO/ReMax/RLOO | Multi | Outcome | Multi | Empathy/Math/Code/MultimodalQA | Rule/Model/External | Yes |
| sweet_rl | DPO | Multi | Process | Multi | Design/Code | Model | Web Browsing |
| rllm | PPO/GRPO | Single | Outcome | Multi | Code Edit | External | Yes |
| open-r1 | GRPO | Single | Outcome | Single | Math/Code | All | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| AgentFlow | 2025.09 | Stanford University | arXiv | veRL | |
| ARPO | 2025.7 | RUC, Kuaishou | Paper | veRL | |
| terminal-bench-rl | 2025.7 | Individual (Danau5tin) | N/A | rLLM | |
| MOTIF | 2025.6 | University of Maryland | Paper | trl | |
| cmriat/l0 | 2025.6 | CMRIAT | Paper | veRL | |
| agent-distillation | 2025.5 | KAIST | Paper | Custom | |
| VDeepEyes | 2025.5 | Xiaohongshu/XJTU | Paper | veRL | |
| EasyR1 | 2025.4 | Individual | repo1/paper2 | veRL | |
| AutoCoA | 2025.3 | BJTU | Paper | veRL | |
| ToRL | 2025.3 | SJTU | Paper | veRL | |
| ReMA | 2025.3 | SJTU, UCL | Paper | veRL | |
| Agentic-Reasoning | 2025.2 | Oxford | Paper | Custom | |
| SimpleTIR | 2025.2 | NTU, Bytedance | Notion Blog | veRL | |
| openrlhf_async_pipline | 2024.5 | OpenRLHF | Paper | OpenRLHF |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| AgentFlow | Flow-GRPO | Single | Outcome | Multi | Search/Math/QA | Model/External | Yes |
| ARPO | GRPO | Single | Outcome | Multi | Math/Coding | Model/Rule | Yes |
| terminal-bench-rl | GRPO | Single | Outcome | Multi | Coding/Terminal | Model+External Verifier | Yes |
| MOTIF | GRPO | Single | Outcome | Multi | QA | Rule | No |
| cmriat/l0 | PPO | Multi | Process | Multi | QA | All | Yes |
| agent-distillation | PPO | Single | Process | Multi | QA/Math | External | Yes |
| VDeepEyes | PPO/GRPO | Multi | Process | Multi | VQA | All | Yes |
| EasyR1 | GRPO | Single | Process | Multi | Vision-Language | Model | Yes |
| AutoCoA | GRPO | Multi | Outcome | Multi | Reasoning/Math/QA | All | Yes |
| ToRL | GRPO | Single | Outcome | Single | Math | Rule/External | Yes |
| ReMA | PPO | Multi | Outcome | Multi | Math | Rule | No |
| Agentic-Reasoning | Custom | Single | Process | Multi | QA/Math | External | Web Browsing |
| SimpleTIR | PPO/GRPO (with extensions) | Single | Outcome | Multi | Math, Coding | All | Yes |
| openrlhf_async_pipline | PPO/REINFORCE++/DPO/RLOO | Single | Outcome | Multi | Dialogue/Reasoning/QA | All | No |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| MEM1 | 2025.7 | MIT | Paper | veRL (based on Search-R1) | |
| Memento | 2025.6 | UCL, Huawei | Paper | Custom | |
| MemAgent | 2025.6 | Bytedance, Tsinghua-SIA | Paper | veRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| MEM1 | PPO/GRPO | Single | Outcome | Multi | WebShop/GSM8K/QA | Rule/Model | Yes |
| Memento | soft Q-Learning | Single | Outcome | Multi | Research/QA/Code/Web | External/Rule | Yes |
| MemAgent | PPO, GRPO, DPO | Multi | Outcome | Multi | Long-context QA | Rule/Model/External | Yes |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| Embodied-R1 | 2025.6 | Tianjing University | Paper | veRL |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| Embodied-R1 | GRPO | Single | Outcome | Single | Grounding/Waypoint | Rule | No |
| Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework |
|---|---|---|---|---|---|
| MMedAgent-RL | 2025.8 | Unknown | paper | Unknown | |
| DoctorAgent-RL | 2025.5 | UCAS/CAS/USTC | Paper | RAGEN | |
| Biomni | 2025.3 | Stanford University (SNAP) | Paper | Custom |
📋 Click to view technical details
| Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage |
|---|---|---|---|---|---|---|---|
| MMedAgent-RL | Unknown | Multi | Unknown | Unknown | Unknown | Unknown | Unknown |
| DoctorAgent-RL | GRPO | Multi | Both | Multi | Consultation/Diagnosis | Model/Rule | No |
| Biomni | TBD | Single | TBD | Single | scRNAseq/CRISPR/ADMET/Knowledge | TBD | Yes |
| Github Repo | 🌟 Stars | Date | Org | Task |
|---|---|---|---|---|
| CompassVerifier | 2025.7 | Shanghai AI Lab | Knowledge/Math/Science/GeneralReasoning | |
| Mind2Web-2 | 2025.6 | Ohio State University | Web | |
| gem | 2025.5 | Sea AI Lab | Math/Code/Game/QA | |
| MLE-Dojo | 2025.5 | GIT, Stanford | MLE | |
| atropos | 2025.4 | Nous Research | Game/Code/Tool | |
| InternBootcamp | 2025.4 | InternBootcamp | Coding/QA/Game | |
| loong | 2025.3 | CAMEL-AI.org | RLVR | |
| reasoning-gym | 2025.1 | open-thought | Math/Game | |
| llmgym | 2025.1 | tensorzero | TextGame/Tool | |
| debug-gym | 2024.11 | Microsoft Research | Debugging/Game/Code | |
| gym-llm | 2024.8 | Rodrigo Sánchez Molina | Control/Game | |
| AgentGym | 2024.6 | Fudan | Web/Game | |
| tau-bench | 2024.6 | Sierra | Tool | |
| appworld | 2024.6 | Stony Brook University | Phone Use | |
| android_world | 2024.5 | Google Research | Phone Use | |
| TheAgentCompany | 2024.3 | CMU, Duke | Coding | |
| LlamaGym | 2024.3 | Rohan Pandey | Game | |
| visualwebarena | 2024.1 | CMU | Web | |
| LMRL-Gym | 2023.12 | UC Berkeley | Game | |
| OSWorld | 2023.10 | HKU, CMU, Salesforce, Waterloo | Computer Use | |
| webarena | 2023.7 | CMU | Web | |
| AgentBench | 2023.7 | Tsinghua University | Game/Web/QA/Tool | |
| WebShop | 2022.7 | Princeton-NLP | Web | |
| ScienceWorld | 2022.3 | AllenAI | TextGame/ScienceQA | |
| alfworld | 2020.10 | Microsoft, CMU, UW | Embodied | |
| factorio-learning-environment | 2021.6 | JackHopkins | Game | |
| jericho | 2018.10 | Microsoft, GIT | TextGame | |
| TextWorld | 2018.6 | Microsoft Research | TextGame |
- JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning
- Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
- Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning
- Acting Less is Reasoning More! Teaching Model to Act Efficiently
- Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
- Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
- MUA-RL: MULTI-TURN USER-INTERACTING AGENTREINFORCEMENT LEARNING FOR AGENTIC TOOL USE
- Understanding Tool-Integrated Reasoning
- Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
- Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning
- SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents
- WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
- EnvX: Agentize Everything with Agentic AI
- UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
- UI-Venus Technical Report: Building High-performance UI Agents with RFT
- Agent2 : An Agent-Generates-Agent Framework for Reinforcement Learning Automation
If you find this repository useful, please consider citing it:
@misc{agentsMeetRL,
title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey},
author={AgentsMeetRL Contributors},
year={2025},
url={https://github.com/thinkwee/agentsMeetRL}
}Made with ❤️ by the AgentsMeetRL community