leslie-fang-intel

Follow

Leslie Fang leslie-fang-intel

Follow

PyTorch Engineer at Intel

25 followers · 30 following

INTC
Shanghai

Achievements

Achievements

Stars

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,969 589 Updated Sep 13, 2025

progschj / ThreadPool

A simple C++11 Thread Pool implementation

C++ 8,495 2,334 Updated Jul 20, 2024

chatboxai / chatbox

User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)

TypeScript 36,589 3,549 Updated Aug 20, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,040 1,080 Updated Sep 12, 2025

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 299 29 Updated Sep 13, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,732 489 Updated Sep 11, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,704 694 Updated Sep 12, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,506 921 Updated Sep 12, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA kernels

C++ 11,721 899 Updated Aug 27, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,910 282 Updated May 15, 2025

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 7,573 782 Updated Sep 9, 2025

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,247 288 Updated May 11, 2025

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,942 526 Updated Apr 11, 2025

deepseek-ai / DeepSeek-V3

Python 99,235 16,194 Updated Aug 28, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

Python 21,953 1,689 Updated Jun 3, 2025

intel / torch-xpu-ops

C++ 55 56 Updated Sep 13, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 17,873 2,910 Updated Sep 13, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 8,421 1,434 Updated Sep 9, 2025

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

715 59 Updated Mar 27, 2025

meta-pytorch / triton-cpu

An experimental CPU backend for Triton (https//github.com/openai/triton)

C++ 45 3 Updated Aug 18, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 16,852 2,239 Updated Sep 13, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 57,892 10,088 Updated Sep 13, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,181 181 Updated Mar 27, 2024

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,350 336 Updated Sep 13, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 11,573 1,741 Updated Sep 12, 2025

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,496 181 Updated Jul 12, 2024

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,136 2,247 Updated Sep 10, 2025

pytorch / tutorials

PyTorch tutorials.

Jupyter Notebook 8,776 4,245 Updated Sep 12, 2025

ultralytics / yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Python 55,330 17,187 Updated Sep 8, 2025

intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,957 291 Updated Aug 29, 2025