Skip to content
View ShotaKaji5207's full-sized avatar

Block or report ShotaKaji5207

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Python 391 25 Updated Nov 28, 2025

Post-training with Tinker

Python 2,238 189 Updated Nov 25, 2025

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook 4,784 452 Updated Nov 27, 2025

Scaling RL on advanced reasoning models

Python 640 40 Updated Oct 20, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 80,029 11,911 Updated Nov 25, 2025

s1: Simple test-time scaling

Python 6,606 763 Updated Jun 25, 2025

A scalable asynchronous reinforcement learning implementation with in-flight weight updates.

Python 316 30 Updated Nov 28, 2025

Environments for LLM Reinforcement Learning

Python 3,551 441 Updated Nov 28, 2025

[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning"

Python 13 1 Updated Oct 31, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,398 164 Updated Nov 28, 2025

Scalable RL solution for advanced reasoning of language models

Python 1,777 99 Updated Mar 18, 2025

Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

Jupyter Notebook 561 51 Updated Oct 7, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 11,325 1,011 Updated Nov 28, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,358 538 Updated Nov 21, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 16,819 2,677 Updated Nov 27, 2025
Python 32 14 Updated Aug 21, 2025

Team Neko (the preliminary)

Python 7 Updated Oct 23, 2025

一週間でなれる!スパコンプログラマ

HTML 723 29 Updated Apr 10, 2025

Fully open reproduction of DeepSeek-R1

Python 25,693 2,402 Updated Nov 24, 2025
Python 92 12 Updated Nov 27, 2025

Data Cleaning using LLMs

Jupyter Notebook 8 Updated Mar 17, 2024

Train transformer language models with reinforcement learning.

Python 16,456 2,322 Updated Nov 28, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,799 6,549 Updated Nov 28, 2025

A very fast and expressive template engine.

Python 11,302 1,684 Updated Jun 14, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 48,781 4,015 Updated Nov 28, 2025

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Python 76 16 Updated Aug 17, 2024

Reproducible, flexible LLM evaluations

Python 287 55 Updated Nov 20, 2025

Pretraining Efficiently on S2ORC!

Python 173 6 Updated Oct 23, 2024

Official Repo for Open-Reasoner-Zero

Python 2,068 119 Updated Jun 2, 2025
Next