Skip to content
View MozerWang's full-sized avatar

Highlights

  • Pro

Block or report MozerWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning."

Python 26 1 Updated Nov 14, 2025

Demystifying Reinforcement Learning in Agentic Reasoning

Python 116 21 Updated Oct 14, 2025

Official Code for NeurIPS'25 ER Workshop "The Zero-Step Thinking: An Empirical Study of Mode Selection as a Harder Early Exit Problem in Reasoning Models"

Python 3 Updated Oct 19, 2025

SimKO: Simple Pass@K Policy Optimization

Python 21 1 Updated Oct 24, 2025

MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games

Python 13 Updated Nov 10, 2025

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

Python 180 15 Updated Nov 1, 2025

Post-training with Tinker

Python 1,979 154 Updated Nov 17, 2025

The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.

Python 381 12 Updated Jul 11, 2025

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more inte…

49 Updated Sep 1, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 829 66 Updated Nov 12, 2025

Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing (EMNLP 2025 main)

5 Updated Sep 7, 2025
Python 249 12 Updated May 14, 2025

A Gym for Agentic LLMs

Python 360 22 Updated Nov 10, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

Python 565 47 Updated Oct 31, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 20,192 3,408 Updated Nov 17, 2025

Layer curation pipeline used in LayerAnimate [ICCV 2025]

Python 4 Updated Aug 22, 2025

Verlog: A Multi-turn RL framework for LLM agents

Python 64 6 Updated Nov 4, 2025
Python 451 35 Updated Aug 28, 2025

[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

Python 498 20 Updated Nov 5, 2025

A collection of verifiable games in sotopia format

Python 3 Updated Aug 16, 2025

[ACL 2025] Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence

Python 11 Updated Jun 10, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,009 233 Updated Nov 17, 2025

Training and inference code for "Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning"

Python 42 3 Updated Jan 20, 2025
Python 23 2 Updated Oct 13, 2024
Python 40 13 Updated Jul 22, 2024
Python 60 Updated Mar 22, 2024

This is the official implementation of Multi-Agent PPO (MAPPO).

Python 1,761 350 Updated Jul 18, 2024
Next