Deep Learning Framework Developer, focusing on model optimization with quantization, distillation, and pruning
-
Intel
- Shanghai
-
09:02
(UTC +08:00)
Stars
An innovative library for efficient LLM inference via low-bit quantization
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
A python script that automatise the training of a CNN, compress it through tensorflow (or ristretto) plugin, and compares the performance of the two networks
pmgysel / caffe
Forked from BVLC/caffeRistretto: Caffe-based approximation of convolutional neural networks.