Skip to content
View omkaark's full-sized avatar
  • 22:20 (UTC -05:00)

Highlights

  • Pro

Block or report omkaark

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

~950 line, minimal, extensible LLM inference engine built from scratch.

Python 164 13 Updated Jan 9, 2026

MoE training for Me and You and maybe other people

Python 315 28 Updated Jan 3, 2026

My learning notes for ML SYS.

Python 5,000 325 Updated Jan 8, 2026

kernels, of the mega variety

Python 641 35 Updated Sep 28, 2025

A framework for the evaluation of autoregressive code generation language models.

Python 1,015 252 Updated Jul 22, 2025

Code for the paper "Efficient Training of Language Models to Fill in the Middle"

Python 194 43 Updated Apr 2, 2023

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 3,076 430 Updated Jan 17, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,420 116 Updated Nov 13, 2025

NanoGPT (124M) in 3 minutes

Python 4,114 550 Updated Jan 7, 2026

Official repository for LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking; published at MICCAI 2025.

Python 201 12 Updated Nov 12, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,046 793 Updated Jan 6, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,476 26,471 Updated Jan 10, 2026

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,188 4,680 Updated Jan 9, 2026

a small protein language model based off of nanochat

Python 2 Updated Oct 20, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,614 395 Updated Jan 10, 2026

FlashAttention written in metal-cpp headers

Makefile 4 1 Updated Oct 5, 2025

The Modular Platform (includes MAX & Mojo)

Mojo 25,427 2,758 Updated Jan 9, 2026

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,498 438 Updated Oct 27, 2025

A PyTorch native platform for training generative AI models

Python 4,943 663 Updated Jan 9, 2026

PyTorch building blocks for the OLMo ecosystem

Python 672 119 Updated Jan 10, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,072 603 Updated Jan 10, 2026

Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)

Python 467 54 Updated Dec 26, 2025

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 706 91 Updated Jan 10, 2026

A Quirky Assortment of CuTe Kernels

Python 743 70 Updated Jan 7, 2026

Python SQL Parser and Transpiler

Python 8,791 1,045 Updated Jan 9, 2026

RL gym for vision language models written in JAX

Python 139 12 Updated Oct 30, 2025

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and qua…

Cuda 216 20 Updated Oct 10, 2025
Python 80 6 Updated Dec 2, 2025
Next