Stars
You like pytorch? You like micrograd? You love tinygrad! ❤️
NVIDIA Linux open GPU kernel module source
Official JAX implementation of End-to-End Test-Time Training for Long Context
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
Resource Multiplexing in Tuning and Serving Large Language Models (USENIX ATC 2025)
Naive attempt at implementing TTT paper by letting autograd do the heavy lifting
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
dspy-cli is a tool for creating, developing, testing, and deploying DSPy programs as HTTP APIs.
NVIDIA Linux open GPU with P2P support
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
Dynamic Memory Management for Serving LLMs without PagedAttention
The official implementation of the ICML 2024 paper "MemoryLLM: Towards Self-Updatable Large Language Models" and "M+: Extending MemoryLLM with Scalable Long-Term Memory"
The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.
Hackable and optimized Transformers building blocks, supporting a composable construction.
A Datacenter Scale Distributed Inference Serving Framework
Train transformer language models with reinforcement learning.
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"