Stars
EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation
Tooling for exact and MinHash deduplication of large-scale text datasets
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.
A Survey of Reinforcement Learning for Large Reasoning Models
Tongyi Deep Research, the Leading Open-source Deep Research Agent
PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.
My learning notes for ML SYS.
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
Paper list of multi-agent reinforcement learning (MARL)
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"