-
Institute of Automation, Chinese Academy of Sciences
- Beijing
-
15:54
(UTC +08:00) - https://mozerwang.github.io
- @minzheng_wang
- https://scholar.google.com/citations?user=glV21ZsAAAAJ
- https://www.semanticscholar.org/author/Minzheng-Wang/2264515707
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
The open-source code for the NeurIPS 2025 paper, "Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning."
Demystifying Reinforcement Learning in Agentic Reasoning
Official Code for NeurIPS'25 ER Workshop "The Zero-Step Thinking: An Empirical Study of Mode Selection as a Harder Early Exit Problem in Reasoning Models"
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Post-training with Tinker
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more inte…
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing (EMNLP 2025 main)
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
SGLang is a fast serving framework for large language models and vision language models.
Layer curation pipeline used in LayerAnimate [ICCV 2025]
WentseChen / Verlog
Forked from volcengine/verlVerlog: A Multi-turn RL framework for LLM agents
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
A collection of verifiable games in sotopia format
[ACL 2025] Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Training and inference code for "Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning"
This is the official implementation of Multi-Agent PPO (MAPPO).