Skip to content
View si0wang's full-sized avatar

Block or report si0wang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 13 1 Updated Dec 10, 2025
Python 24 1 Updated Jun 18, 2025
Jupyter Notebook 31 3 Updated Jan 6, 2026

颈椎病腰突康复指南,为程序员群体提供简单可靠的康复指南。

Python 3,417 220 Updated Dec 25, 2023
Python 105 6 Updated Jun 10, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,565 189 Updated Jan 10, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 8,763 848 Updated Jan 8, 2026

A fork to add multimodal model training to open-r1

Python 1,434 70 Updated Feb 8, 2025

Simple RL training for reasoning

Python 3,824 283 Updated Dec 23, 2025
Python 46 5 Updated Dec 30, 2024

The official implementation of Natural Language Fine-Tuning

Python 54 4 Updated Jan 7, 2025

[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS

Python 1,232 112 Updated Sep 19, 2025

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models

Python 412 26 Updated Jun 25, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,387 7,946 Updated Jan 9, 2026

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 328 37 Updated Aug 6, 2024
Python 9 Updated Apr 30, 2025
Python 8 1 Updated Jul 1, 2024
Jupyter Notebook 32 Updated Feb 8, 2024
Python 23 2 Updated Apr 2, 2024
Jupyter Notebook 7 1 Updated Feb 28, 2024
Python 121 21 Updated Nov 25, 2025

PyTorch implementation of DreamerV3, Mastering Diverse Domains through World Models.

Python 10 2 Updated Feb 16, 2024

a distributed deep learning platform

C++ 3,584 1,267 Updated Jan 10, 2026

Simple maze environments using mujoco-py

Python 58 12 Updated Dec 27, 2023

Implementation of Dreamer v3 in pytorch.

Python 764 191 Updated Sep 27, 2024

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Python 2,142 638 Updated Jan 7, 2026

A curated list of awesome model based RL resources (continually updated)

1,271 73 Updated Dec 20, 2025

Benchmark for Continuous Multi-Agent Robotic Control, based on OpenAI's Mujoco Gym environments.

Python 364 36 Updated Mar 16, 2023