Skip to content
View gongel's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@PaddlePaddle

Block or report gongel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 120 17 Updated Sep 5, 2025

MemOS (Preview) | Intelligence Begins with Memory

Python 2,452 213 Updated Sep 11, 2025

slime is a LLM post-training framework for RL Scaling.

Python 1,752 156 Updated Sep 12, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 57,980 10,106 Updated Sep 14, 2025
Python 42 4 Updated Jul 21, 2025

Scalable toolkit for efficient model reinforcement

Python 859 130 Updated Sep 14, 2025

Decentralized RL Training at Scale

Python 590 95 Updated Sep 14, 2025

Muon is an optimizer for hidden layers in neural networks

Python 1,707 78 Updated Jul 12, 2025

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 710 44 Updated Jun 6, 2025

Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling

Python 443 19 Updated May 17, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,154 385 Updated Sep 11, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,937 371 Updated Sep 14, 2025

FireFlyer Record file format, writer and reader for DL training samples.

Python 233 25 Updated Dec 1, 2022

Distributed RL System for LLM Reasoning

Python 2,600 171 Updated Sep 14, 2025

Official Repo for Open-Reasoner-Zero

Python 2,039 111 Updated Jun 2, 2025
Python 209 8 Updated Feb 20, 2025
Python 930 42 Updated Jul 2, 2025

LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)

Python 17 1 Updated Jun 10, 2025

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,141 132 Updated Jan 30, 2025

🤗 smolagents: a barebones library for agents that think in code.

Python 22,791 1,998 Updated Sep 12, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 45,445 3,692 Updated Sep 14, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,185 1,501 Updated Apr 24, 2025

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python 8,459 644 Updated Sep 12, 2025

Fully open reproduction of DeepSeek-R1

Python 25,426 2,374 Updated Sep 8, 2025

[NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Python 19 Updated May 31, 2025

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 148 3 Updated Dec 24, 2024

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 322 38 Updated Jul 28, 2025
Next