dirtycomputer

buaa42wxy dirtycomputer

54 followers · 219 following

Beihang University
haidian
03:43 (UTC +08:00)
dirtycomputer.github.io

Achievements

Lists (1)

Sort

🔮 Future ideas

1 repository

Starred repositories

ChenWu98 / algorithmic-creativity

[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

Python 79 7 Updated May 26, 2025

sii-research / siiRL

siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems

Python 271 22 Updated Nov 27, 2025

IBM / fmwork

Tools and pipelines for automated LLM performance evaluation

Python 12 20 Updated Nov 10, 2025

OpenMOSS / OpenMOSS.github.io

JavaScript 1 2 Updated Nov 25, 2025

uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Python 1,221 102 Updated May 8, 2024

MinkaiXu / fPO

f-PO: Generalizing Preference Optimization with f-divergence Minimization

Python 13 Updated Apr 2, 2025

SynthLabsAI / big-math

A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Python 68 5 Updated Feb 25, 2025

martinarjovsky / WassersteinGAN

Python 3,240 729 Updated Dec 26, 2018

tongjingqi / Thinking-with-Video

We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…

Python 212 4 Updated Nov 24, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,614 288 Updated Nov 28, 2025

yifan123 / flow_grpo

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,649 98 Updated Nov 4, 2025

suragnair / alpha-zero-general

A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more

Jupyter Notebook 4,315 1,131 Updated Jan 1, 2025

litexlang / golitex

Litex is a simple formal language Learnable in 2 hours.

Go 593 8 Updated Nov 28, 2025

rdi-berkeley / awesome-RLVR-boundary

A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Language Models (LLMs).

81 7 Updated Oct 23, 2025