Skip to content
View zcwang's full-sized avatar

Block or report zcwang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,883 330 Updated Nov 28, 2025

A command-line interface tool for serving LLM using vLLM.

Python 458 24 Updated Dec 3, 2025

Intel® AI Assistant Builder

JavaScript 139 27 Updated Dec 31, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,708 761 Updated Jan 3, 2026

✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

Jupyter Notebook 15,697 1,827 Updated Dec 31, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,135 901 Updated Dec 24, 2025

Token level visualization tools for large language models

Python 90 11 Updated Jan 8, 2025

OpenAI Triton backend for Intel® GPUs

MLIR 223 81 Updated Jan 2, 2026

Production-ready platform for agentic workflow development.

Python 124,420 19,345 Updated Jan 2, 2026
4 Updated Apr 8, 2024

Basic install and use Gemma3 via ollama in Colab

Jupyter Notebook 4 1 Updated Jun 3, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 89,478 10,312 Updated Jan 3, 2026

Model Context Protocol Servers

TypeScript 75,393 9,142 Updated Dec 19, 2025

A simple, secure MCP-to-OpenAPI proxy server

Python 3,823 430 Updated Dec 8, 2025

Ollama with intel (i)GPU acceleration in docker and benchmark

Python 30 7 Updated Dec 24, 2025

A monitor of resources

C++ 29,364 881 Updated Dec 30, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 5,170 1,810 Updated Feb 26, 2025

Happy experimenting with MLLM and LLM models!

Jupyter Notebook 128 30 Updated Oct 16, 2024
Dockerfile 257 61 Updated Jun 4, 2025

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

72,088 8,278 Updated Dec 22, 2025

MLPerf® Storage Benchmark Suite

Python 171 54 Updated Dec 22, 2025

Bjorn Services: an AI Microservices suite. LlaVA and BridgeTower component

Python 1 Updated Jul 19, 2024

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,585 1,395 Updated Oct 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,586 769 Updated Dec 22, 2025

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目

2,722 242 Updated Oct 30, 2025

Vision agent

Python 5,190 582 Updated Dec 3, 2025

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 98,826 11,213 Updated Jan 3, 2026

Advanced Matrix Extensions (AMX) Guide

C++ 108 8 Updated Jan 11, 2022

Knowledge Base QA using RAG pipeline on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with BigDL-LLM

Python 1 Updated Jun 12, 2024

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,170 216 Updated Oct 8, 2024
Next