- Beijing
Lists (1)
Sort Name ascending (A-Z)
Stars
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
MemOS (Preview) | Intelligence Begins with Memory
slime is a LLM post-training framework for RL Scaling.
A high-throughput and memory-efficient inference and serving engine for LLMs
Scalable toolkit for efficient model reinforcement
Muon is an optimizer for hidden layers in neural networks
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
Democratizing Reinforcement Learning for LLMs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
FireFlyer Record file format, writer and reader for DL training samples.
Official Repo for Open-Reasoner-Zero
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently (ICML2025 Oral)
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
🤗 smolagents: a barebones library for agents that think in code.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Minimal reproduction of DeepSeek R1-Zero
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
Fully open reproduction of DeepSeek-R1
[NAACL 2025] MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
PKU-DAIR / Hetu
Forked from Hsword/HetuA high-performance distributed deep learning system targeting large-scale and automated distributed training.