gongel

🎯

Focusing

gongel gongel

🎯

Focusing

75 followers · 54 following

Beijing

Achievements

x2 x3

Achievements

x2 x3

Organizations

Lists (1)

Sort

🔮 Future ideas

Stars

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 120 17 Updated Sep 5, 2025

MemTensor / MemOS

MemOS (Preview) | Intelligence Begins with Memory

Python 2,452 213 Updated Sep 11, 2025

THUDM / slime

slime is a LLM post-training framework for RL Scaling.

Python 1,752 156 Updated Sep 12, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 57,980 10,106 Updated Sep 14, 2025

hkgc-1 / GHPO

Python 42 4 Updated Jul 21, 2025

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 859 130 Updated Sep 14, 2025

PrimeIntellect-ai / prime-rl

Decentralized RL Training at Scale

Python 590 95 Updated Sep 14, 2025

KellerJordan / Muon

Muon is an optimizer for hidden layers in neural networks

Python 1,707 78 Updated Jul 12, 2025

SkyworkAI / Skywork-OR1

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 710 44 Updated Jun 6, 2025

QwenLM / ParScale

Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling

Python 443 19 Updated May 17, 2025

rllm-org / rllm

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,154 385 Updated Sep 11, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,937 371 Updated Sep 14, 2025

HFAiLab / ffrecord

FireFlyer Record file format, writer and reader for DL training samples.

Python 233 25 Updated Dec 1, 2022

inclusionAI / AReaL

Distributed RL System for LLM Reasoning

Python 2,600 171 Updated Sep 14, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 2,039 111 Updated Jun 2, 2025

GAIR-NLP / LIMR

Python 209 8 Updated Feb 20, 2025

huggingface / Math-Verify

Python 930 42 Updated Jul 2, 2025

GuanghaoYe / Emergence-of-Thinking

Forked from OpenRLHF/OpenRLHF

Python 53 4 Updated Feb 11, 2025

YuanheZ / LoRA-One

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)

Python 17 1 Updated Jun 10, 2025

SakanaAI / self-adaptive-llms

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,141 132 Updated Jan 30, 2025

huggingface / smolagents

🤗 smolagents: a barebones library for agents that think in code.

Python 22,791 1,998 Updated Sep 12, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 45,445 3,692 Updated Sep 14, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 12,185 1,501 Updated Apr 24, 2025

oumi-ai / oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python 8,459 644 Updated Sep 12, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,426 2,374 Updated Sep 8, 2025

deepseek-ai / DeepSeek-R1

91,072 11,739 Updated Jun 27, 2025

sufenlp / MiLoRA

[NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Python 19 Updated May 31, 2025

SkyworkAI / skywork-o1-prm-inference

Python 65 5 Updated Nov 26, 2024

ADaM-BJTU / OpenRFT

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 148 3 Updated Dec 24, 2024

PKU-DAIR / Hetu

Forked from Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 322 38 Updated Jul 28, 2025