Starred repositories
Generate a timeline of your day, automatically
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
A curated list of materials on AI efficiency
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
A configuration framework that enhances Claude Code with specialized commands, cognitive personas, and development methodologies.
Complete solutions to the Programming Massively Parallel Processors Edition 4
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
Lower-latency OpenMP-style minimalistic scoped thread-pool designed for 'Fork-Join' parallelism in Rust and C++, avoiding memory allocations, mutexes, CAS-primitives, and false-sharing on the hot p…
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
Open-source implementation of AlphaEvolve
🔥Highlighting the top ML papers every week.
Curated resources for discovering, reading, and working with arXiv papers
Replace 'hub' with 'ingest' in any GitHub URL to get a prompt-friendly extract of a codebase
Produces a serialized hardware report of the physical infrastructure for automation
A curated list of awesome commands, files, and workflows for Claude Code
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
A JAX research toolkit for building, editing, and visualizing neural networks.
This repository is established to store personal notes and annotated papers during daily research.
A lightweight library for portable low-level GPU computation using WebGPU.
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Code for solving LP on GPU using first-order methods