PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.

Python 34 1 Updated Sep 9, 2025

cyzcz / Tase

Python 10 Updated Aug 11, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 5,014 325 Updated Jan 8, 2026

OpenPipe / ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

Python 8,134 645 Updated Jan 10, 2026

TsinghuaC3I / MARTI

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python 386 43 Updated Nov 20, 2025

LantaoYu / MARL-Papers

Paper list of multi-agent reinforcement learning (MARL)

4,664 766 Updated Nov 19, 2025

langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,382 118 Updated Dec 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xindaaW

Block or report xindaaW

Stars

xindaaW / EvolvR

XxxXTeam / business2api

allenai / duplodocus

zou-group / textgrad

CSU-JPG / VCode

rllm-org / rllm

TsinghuaC3I / Awesome-RL-for-LRMs