Skip to content
View idning's full-sized avatar

Block or report idning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 946 96 Updated Dec 30, 2024

Triton implementation of Flash Attention2.0

Python 40 5 Updated Jul 31, 2023

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,189 97 Updated Oct 6, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,798 717 Updated Oct 15, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,184 1,092 Updated Oct 12, 2025

Minimal hackable GRPO implementation

Python 293 41 Updated Jan 31, 2025

Implementation of papers in 100 lines of code.

Python 1,630 171 Updated Oct 12, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,180 800 Updated Oct 9, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,273 1,512 Updated Apr 24, 2025

Instead of running one environment at a time or one per thread, run everything in batch using numpy on a single core.

Jupyter Notebook 5 2 Updated Feb 19, 2018

Fully open reproduction of DeepSeek-R1

Python 25,552 2,395 Updated Sep 8, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,515 267 Updated Oct 17, 2025

A PyTorch native platform for training generative AI models

Python 4,552 565 Updated Oct 17, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,814 523 Updated Oct 17, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 697 144 Updated Oct 17, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,425 348 Updated Oct 17, 2025

Development repository for the Triton language and compiler

MLIR 17,247 2,317 Updated Oct 17, 2025

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 385 51 Updated Jan 2, 2025

TORCH_LOGS parser for PT2

Rust 62 20 Updated Sep 20, 2025

A very simple shared memory dict implementation

Python 173 23 Updated Aug 26, 2022

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Python 2,864 210 Updated Mar 8, 2024

Seamless operability between C++11 and Python

C++ 17,359 2,226 Updated Oct 16, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,044 1,214 Updated Aug 4, 2025

Denoising Diffusion Probabilistic Models

Python 4,753 448 Updated Aug 29, 2023

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

Jupyter Notebook 939 74 Updated May 7, 2024

An open source implementation of CLIP.

Python 12,779 1,177 Updated Sep 21, 2025

PyTorch Implementation of OpenAI's Image GPT

Python 260 33 Updated Oct 3, 2023

Large Language Model-enhanced Recommender System Papers

722 58 Updated Aug 15, 2025

An unnecessarily tiny implementation of GPT-2 in NumPy.

Python 3,417 440 Updated Apr 24, 2023

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

Python 2,245 513 Updated Jan 25, 2019
Next