为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…

Python 69,386 8,378 Updated Sep 20, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,655 1,494 Updated Oct 22, 2025

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,203 183 Updated Mar 27, 2024

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 7,673 791 Updated Oct 22, 2025

hahnyuan / RPTQ4LLM

Reorder-based post-training quantization for large language model

Python 194 14 Updated May 17, 2023

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 859 71 Updated May 22, 2025

Vahe1994 / SpQR

Python 546 43 Updated Dec 16, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,319 276 Updated Jul 17, 2025

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 385 34 Updated Feb 24, 2024

hahnyuan / PB-LLM

PB-LLM: Partially Binarized Large Language Models

Python 156 8 Updated Nov 20, 2023

xvyaward / owq

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".

Python 66 8 Updated Mar 7, 2024

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 769 53 Updated Mar 6, 2025

ModelTC / Outlier_Suppression_Plus

Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Python 47 4 Updated Oct 21, 2023

whitenightwu / caffe-quant-shiftcnn

Achieve quantization way nemad 'ShiftCNN' in caffe.

Python 1 1 Updated May 26, 2018

jy-yuan / KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 329 33 Updated Sep 25, 2025

LetheSec / HuggingFace-Download-Accelerator

利用HuggingFace的官方下载工具从镜像网站进行高速下载。

Python 1,225 113 Updated Oct 12, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 10,421 2,796 Updated Oct 20, 2025

JunLi-Galios / Optimization-on-Stiefel-Manifold-via-Cayley-Transform

Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform

Python 43 13 Updated Apr 26, 2019

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 435 50 Updated Nov 26, 2024

facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 336 56 Updated Feb 14, 2025

Dao-AILab / fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

C 253 46 Updated Oct 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shao Yuantian alvinshao0313

Highlights

Block or report alvinshao0313

LLM

facebookresearch / LLM-QAT

EgoRedMC / pytorch_OBS

YanjingLi0202 / Q-ViT

meta-llama / llama

facebookresearch / metaseq

FlagAI-Open / FlagAI

zai-org / ChatGLM3

oobabooga / text-generation-webui

QwenLM / Qwen3

binary-husky / gpt_academic