-
Nanjing University of Science and Technology(NJUST)
- Automation Building, No. 95, Zhongguancun East Road, Haidian District, Beijing
- https://www.njust.edu.cn/
Highlights
- Pro
LLM
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
The official implementation of the NeurIPS 2022 paper Q-ViT.
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
The definitive Web UI for local AI, with powerful features and easy setup.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Accessible large language models via k-bit quantization for PyTorch.
Reorder-based post-training quantization for large language model
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
PB-LLM: Partially Binarized Large Language Models
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Achieve quantization way nemad 'ShiftCNN' in caffe.
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
A framework for few-shot evaluation of language models.
Efficient Riemannian Optimization on Stiefel Manifold via Cayley Transform
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
Fast Hadamard transform in CUDA, with a PyTorch interface