-
University of Maryland, College Park
- https://si0wang.github.io/
Stars
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
A fork to add multimodal model training to open-r1
The official implementation of Natural Language Fine-Tuning
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
PyTorch implementation of DreamerV3, Mastering Diverse Domains through World Models.
Implementation of Dreamer v3 in pytorch.
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
A curated list of awesome model based RL resources (continually updated)
Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.