Skip to content
View zzmtsvv's full-sized avatar

Block or report zzmtsvv

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,339 56 Updated Nov 18, 2025

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,184 981 Updated Jul 1, 2024

Open-Source Models for VLA Arena

Python 6 Updated Nov 17, 2025

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 145 14 Updated Sep 30, 2025

Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA

Python 29 2 Updated Nov 16, 2025
Python 97 4 Updated Nov 19, 2025

CUTLASS and CuTe Examples

Cuda 107 13 Updated Nov 25, 2025

🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 102 2 Updated Nov 10, 2025

Minimum implementation of EDM (Elucidating the Design Space of Diffusion-Based Generative Models) on cifar10 and mnist

Python 64 5 Updated Dec 16, 2023
Python 144 10 Updated Oct 31, 2025

Code for "What really matters in matrix-whitening optimizers?"

Python 17 1 Updated Oct 31, 2025

Fast CUDA matrix multiplication from scratch

Cuda 960 143 Updated Sep 2, 2025

minimal Energy-based transformer

Python 40 3 Updated Nov 2, 2025

A foundation model to learn multiple physical systems at once

Python 69 5 Updated Nov 26, 2025

Code for "Transitive RL: Value Learning via Divide and Conquer"

Python 39 2 Updated Oct 31, 2025

🎨 Native AI image generation for Apple Silicon with Qwen-Image. Lightning LoRA acceleration for fast 4–8 step runs. Zero Docker, just works.

Python 9 Updated Sep 15, 2025

Chimera: State Space Models Beyond Sequences

Jupyter Notebook 6 Updated Oct 15, 2025

Library for FO Optimization

Jupyter Notebook 2 Updated Oct 12, 2025

torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JAX-Pytorch interoperability, meaning, one can mix JAX & Pytor…

Python 133 15 Updated Nov 25, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 167 42 Updated Nov 26, 2025
Python 147 14 Updated Dec 27, 2024

The official implementation of "Dual Goal Representations"

Python 23 1 Updated Oct 7, 2025

The best ChatGPT that $100 can buy.

Python 37,596 4,607 Updated Nov 17, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 726 45 Updated Oct 15, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,315 417 Updated Oct 27, 2025

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 939 55 Updated Oct 16, 2025

A JAX-native LLM Post-Training Library

Python 1,918 176 Updated Nov 26, 2025

Fast Diffusion Models with Transformers

Python 900 118 Updated Aug 17, 2025

Large multi-modal models (L3M) pre-training.

Python 221 12 Updated Sep 22, 2025
Next