Stars
Ongoing research training transformer models at scale
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…
llama3 implementation one matrix multiplication at a time
Video+code lecture on building nanoGPT from scratch
FlagGems is an operator library for large language models implemented in the Triton Language.
High-speed Large Language Model Serving for Local Deployment
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
AirLLM 70B inference with single 4GB GPU
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Hackable and optimized Transformers building blocks, supporting a composable construction.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
4 bits quantization of LLaMA using GPTQ
Instruct-tune LLaMA on consumer hardware
Port of OpenAI's Whisper model in C/C++
A VSCode extension that allows you to use ChatGPT