Skip to content
View mingfeima's full-sized avatar
:octocat:
i do not stand by in the presence of evil
:octocat:
i do not stand by in the presence of evil
  • Intel Asia-Pacific R&D

Block or report mingfeima

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Nano vLLM

Python 6,565 810 Updated Aug 31, 2025
Python 791 41 Updated Aug 25, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,359 1,734 Updated Sep 11, 2025
C++ 291 24 Updated Sep 4, 2025
C++ 495 40 Updated Sep 12, 2025

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 43,717 3,613 Updated Sep 10, 2025

A trainable PyTorch reproduction of AlphaFold 3.

Python 1,309 185 Updated Sep 10, 2025

FlashMLA: Efficient MLA kernels

C++ 11,721 899 Updated Aug 27, 2025

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,310 1,374 Updated Sep 12, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,034 606 Updated Sep 12, 2025

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,378 215 Updated Aug 5, 2025

Official inference framework for 1-bit LLMs

Python 21,949 1,687 Updated Jun 3, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,039 1,079 Updated Sep 12, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 887 41 Updated Aug 12, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,928 371 Updated Sep 13, 2025

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,160 360 Updated Aug 14, 2025

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,220 461 Updated Aug 7, 2024

Universal LLM Deployment Engine with ML Compilation

Python 21,314 1,816 Updated Sep 13, 2025

a lightweight LLM model inference framework

C++ 739 92 Updated Apr 7, 2024

The definitive Web UI for local AI, with powerful features and easy setup.

Python 44,946 5,776 Updated Sep 3, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,828 245 Updated Aug 31, 2025

Sparsity-aware deep learning inference runtime for CPUs

Python 3,159 190 Updated Jun 2, 2025

Low-bit LLM inference on CPU/NPU with lookup table

C++ 852 70 Updated Jun 5, 2025

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,609 251 Updated Sep 10, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,350 336 Updated Sep 13, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,328 444 Updated Aug 2, 2025

A generative speech model for daily dialogue.

Python 37,789 4,090 Updated Jul 6, 2025
Next