-
Pretending in Hangzhou Creative Culture Company(PH3C)
- Beijing(wangduo.cnblogs.com)
- zhihu.com/people/wangduo2014
Stars
An Application Framework for AI Engineering
Agentic AI Framework for Java Developers
A construction kit for reinforcement learning environment management.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
RewardBench: the first evaluation tool for reward models.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
verl: Volcano Engine Reinforcement Learning for LLMs
SGLang is a high-performance serving framework for large language models and multimodal models.
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Fine-Grained Open Domain Image Animation with Motion Guidance
超长文本分类(大于1000字);文档级/篇章级文本分类;主要是解决长距离依赖问题
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Awesome papers about generative Information Extraction (IE) using Large Language Models (LLMs)
Neo4j graph construction from unstructured data using LLMs
Modeling, training, eval, and inference code for OLMo
Open source Python library for converting PDF to DOCX.