Skip to content
View dirtycomputer's full-sized avatar

Block or report dirtycomputer

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Python 79 7 Updated May 26, 2025

siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems

Python 271 22 Updated Nov 27, 2025

Tools and pipelines for automated LLM performance evaluation

Python 12 20 Updated Nov 10, 2025
JavaScript 1 2 Updated Nov 25, 2025

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,221 102 Updated May 8, 2024

f-PO: Generalizing Preference Optimization with f-divergence Minimization

Python 13 Updated Apr 2, 2025

A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Python 68 5 Updated Feb 25, 2025

We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…

Python 212 4 Updated Nov 24, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,614 288 Updated Nov 28, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,649 98 Updated Nov 4, 2025

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Jupyter Notebook 4,315 1,131 Updated Jan 1, 2025

Litex is a simple formal language Learnable in 2 hours.

Go 593 8 Updated Nov 28, 2025

A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).

81 7 Updated Oct 23, 2025

🔍 Explore curated resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning limits of Large Language Models (LLMs).

4 Updated Nov 28, 2025

Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""

Python 23 Updated Oct 12, 2025

Offcial Code of EyeReal

C++ 53 2 Updated Nov 28, 2025

[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"

Python 120 6 Updated Oct 27, 2025

Calibrate the camera with ZhangZhengyou method (in both distortion case and no distortion case)

Python 548 145 Updated Mar 18, 2024

Python implementation of Zhang's camera calibration method

Jupyter Notebook 1 1 Updated Feb 27, 2023

Implementation of Zhang, Z., "A Flexible New Technique for Camera Calibration" (2000).

Python 75 18 Updated May 12, 2019

Intro to Reinforcement Learning (强化学习纲要)

3,515 504 Updated Jul 25, 2020

RLP: Reinforcement as a Pretraining Objective

201 13 Updated Oct 5, 2025

The best ChatGPT that $100 can buy.

Python 37,719 4,631 Updated Nov 17, 2025

MMD, Hausdorff and Sinkhorn divergences scaled up to 1,000,000 samples.

Python 57 12 Updated Apr 15, 2019

implementation of Wasserstein Natural Policy Gradients and Wasserstein Natural Evolution Strategies

Python 13 3 Updated Mar 9, 2021

A library of reinforcement learning components and agents

Python 3,856 511 Updated Sep 26, 2025

Source code of our ICML 2025 paper "Flowing Datasets with Wasserstein over Wasserstein Gradient Flows"

Jupyter Notebook 16 2 Updated May 21, 2025

Moved to https://github.com/HINTLab/LeFusion

Python 5 Updated Sep 18, 2025
Python 136 13 Updated Sep 28, 2025
Next