mingfeima

Follow

i do not stand by in the presence of evil

Ma Mingfei mingfeima

i do not stand by in the presence of evil

Follow

PyTorch Optimization on Intel platform

230 followers · 9 following

Intel Asia-Pacific R&D

Achievements

Achievements

Stars

GeeeekExplorer / nano-vllm

Nano vLLM

Python 6,565 810 Updated Aug 31, 2025

ByteDance-Seed / seed-oss

Python 791 41 Updated Aug 25, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 18,359 1,734 Updated Sep 11, 2025

stepfun-ai / StepMesh

C++ 291 24 Updated Sep 4, 2025

Tencent / KsanaLLM

C++ 495 40 Updated Sep 12, 2025

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 43,717 3,613 Updated Sep 10, 2025

bytedance / Protenix

A trainable PyTorch reproduction of AlphaFold 3.

Python 1,309 185 Updated Sep 10, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA kernels

C++ 11,721 899 Updated Aug 27, 2025

deepseek-ai / DeepSeek-V3

Python 99,232 16,197 Updated Aug 28, 2025

intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,310 1,374 Updated Sep 12, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,034 606 Updated Sep 12, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,378 215 Updated Aug 5, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 21,949 1,687 Updated Jun 3, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,039 1,079 Updated Sep 12, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 887 41 Updated Aug 12, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,928 371 Updated Sep 13, 2025

deepspeedai / Megatron-DeepSpeed

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,160 360 Updated Aug 14, 2025

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,220 461 Updated Aug 7, 2024

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 21,314 1,816 Updated Sep 13, 2025

aws-samples / genai-llm-cpu-sagemaker

Python 16 5 Updated Jun 25, 2024

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 739 92 Updated Apr 7, 2024

oobabooga / text-generation-webui

The definitive Web UI for local AI, with powerful features and easy setup.

Python 44,946 5,776 Updated Sep 3, 2025

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,828 245 Updated Aug 31, 2025

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Python 3,159 190 Updated Jun 2, 2025

microsoft / T-MAC

Low-bit LLM inference on CPU/NPU with lookup table

C++ 852 70 Updated Jun 5, 2025

pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,609 251 Updated Sep 10, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,350 336 Updated Sep 13, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,328 444 Updated Aug 2, 2025

YavorGIvanov / sam.cpp

C++ 1,278 61 Updated Oct 24, 2023

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 37,789 4,090 Updated Jul 6, 2025