-
Beihang University
- haidian
-
09:53
(UTC +08:00) - dirtycomputer.github.io
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
f-PO: Generalizing Preference Optimization with f-divergence Minimization
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
slime is an LLM post-training framework for RL Scaling.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
Litex is a simple formal language Learnable in 2 hours.
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).
🔍 Explore curated resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning limits of Large Language Models (LLMs).
Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""
[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"
Calibrate the camera with ZhangZhengyou method (in both distortion case and no distortion case)
Python implementation of Zhang's camera calibration method
Implementation of Zhang, Z., "A Flexible New Technique for Camera Calibration" (2000).
MMD, Hausdorff and Sinkhorn divergences scaled up to 1,000,000 samples.
implementation of Wasserstein Natural Policy Gradients and Wasserstein Natural Evolution Strategies
A library of reinforcement learning components and agents
Source code of our ICML 2025 paper "Flowing Datasets with Wasserstein over Wasserstein Gradient Flows"
M3DV / LeFusion
Forked from HINTLab/LeFusionMoved to https://github.com/HINTLab/LeFusion
A Survey of Reinforcement Learning for Large Reasoning Models
Mirror Descent Policy Optimization
Awesome Reasoning LLM Tutorial/Survey/Guide
A curated list of awesome Deep Reinforcement Learning resources.