Skip to content
View dirtycomputer's full-sized avatar

Block or report dirtycomputer

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

f-PO: Generalizing Preference Optimization with f-divergence Minimization

Python 13 Updated Apr 2, 2025

A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Python 67 5 Updated Feb 25, 2025

We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…

166 3 Updated Nov 10, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,444 248 Updated Nov 11, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,569 88 Updated Nov 4, 2025

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Jupyter Notebook 4,301 1,122 Updated Jan 1, 2025

Litex is a simple formal language Learnable in 2 hours.

Go 575 7 Updated Nov 10, 2025

A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).

75 6 Updated Oct 23, 2025

🔍 Explore curated resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning limits of Large Language Models (LLMs).

4 Updated Nov 12, 2025

Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""

Python 22 Updated Oct 12, 2025

Offcial Code of EyeReal

C++ 15 1 Updated Sep 23, 2025

[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"

Python 118 6 Updated Oct 27, 2025

Calibrate the camera with ZhangZhengyou method (in both distortion case and no distortion case)

Python 543 145 Updated Mar 18, 2024

Python implementation of Zhang's camera calibration method

Jupyter Notebook 1 1 Updated Feb 27, 2023

Implementation of Zhang, Z., "A Flexible New Technique for Camera Calibration" (2000).

Python 74 18 Updated May 12, 2019

Intro to Reinforcement Learning (强化学习纲要)

3,493 503 Updated Jul 25, 2020

RLP: Reinforcement as a Pretraining Objective

200 13 Updated Oct 5, 2025

The best ChatGPT that $100 can buy.

Python 36,414 4,362 Updated Nov 5, 2025

MMD, Hausdorff and Sinkhorn divergences scaled up to 1,000,000 samples.

Python 57 12 Updated Apr 15, 2019

implementation of Wasserstein Natural Policy Gradients and Wasserstein Natural Evolution Strategies

Python 13 3 Updated Mar 9, 2021

A library of reinforcement learning components and agents

Python 3,841 506 Updated Sep 26, 2025

Source code of our ICML 2025 paper "Flowing Datasets with Wasserstein over Wasserstein Gradient Flows"

Jupyter Notebook 15 2 Updated May 21, 2025

Moved to https://github.com/HINTLab/LeFusion

Python 5 Updated Sep 18, 2025
Python 123 9 Updated Sep 28, 2025

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,022 112 Updated Nov 9, 2025

Mirror Descent Policy Optimization

Python 41 3 Updated Oct 31, 2020

Awesome Reasoning LLM Tutorial/Survey/Guide

Python 2,139 149 Updated Oct 14, 2025

A curated list of awesome Deep Reinforcement Learning resources.

827 79 Updated Jul 13, 2025
Python 52 4 Updated Jul 21, 2025
Next