Skip to content
View xindaaW's full-sized avatar

Block or report xindaaW

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

EvolvR: Self-Evolving Pairwise Reasoning for Story Evaluation to Enhance Generation

Python 3 Updated Aug 8, 2025

OpenAI/Gemini 兼容的 Gemini Business API 代理服务

Go 169 35 Updated Dec 31, 2025

Tooling for exact and MinHash deduplication of large-scale text datasets

Rust 51 4 Updated Jan 9, 2026

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.

Python 3,274 266 Updated Jul 25, 2025

VCode: SVG as Symbolic Visual Representation

Python 118 6 Updated Dec 19, 2025

Democratizing Reinforcement Learning for LLMs

Python 4,963 477 Updated Jan 10, 2026

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,239 122 Updated Nov 9, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,904 1,377 Updated Jan 8, 2026

PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.

Python 34 1 Updated Sep 9, 2025
Python 10 Updated Aug 11, 2025

My learning notes for ML SYS.

Python 5,014 325 Updated Jan 8, 2026

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

Python 8,134 645 Updated Jan 10, 2026

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python 386 43 Updated Nov 20, 2025

Paper list of multi-agent reinforcement learning (MARL)

4,664 766 Updated Nov 19, 2025

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

Python 1,382 118 Updated Dec 11, 2025