Stars
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A trainable PyTorch reproduction of AlphaFold 3.
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Official inference framework for 1-bit LLMs
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
A throughput-oriented high-performance serving framework for LLMs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Universal LLM Deployment Engine with ML Compilation
The definitive Web UI for local AI, with powerful features and easy setup.
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Sparsity-aware deep learning inference runtime for CPUs
Run PyTorch LLMs locally on servers, desktop and mobile
PyTorch native quantization and sparsity for training and inference
High-speed Large Language Model Serving for Local Deployment
A generative speech model for daily dialogue.