Starred repositories
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
zlib replacement with optimizations for "next generation" systems.
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
Karabiner-Elements is a powerful tool for customizing keyboards on macOS
Performance-portable, length-agnostic SIMD with runtime dispatch
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization
Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
A machine learning accelerator core designed for energy-efficient AI at the edge.
Userspace/GPU eBPF VM with llvm JIT/AOT compiler
Self-implemented NN operators for Qualcomm's Hexagon NPU
Kernels & AI inference engine for mobile devices.
Tools to set up a quick macOS VM in QEMU, accelerated by KVM.
On-device AI across mobile, embedded and edge for PyTorch
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.