Skip to content
View kk-aki's full-sized avatar

Block or report kk-aki

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ongoing research training transformer models at scale

Python 13,943 3,186 Updated Oct 25, 2025

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Python 934 47 Updated Jun 27, 2024

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,286 86 Updated Mar 27, 2025

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 91,946 10,326 Updated Oct 24, 2025

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 10,540 1,078 Updated Apr 30, 2025

Notes on OOPS and Computer Networks

270 70 Updated Mar 4, 2025

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 37,032 8,121 Updated Aug 24, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,179 1,292 Updated May 23, 2024

Video+code lecture on building nanoGPT from scratch

Python 4,466 704 Updated Aug 13, 2024

LLM101n: Let's build a Storyteller

35,281 1,912 Updated Aug 1, 2024

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 703 145 Updated Oct 24, 2025

Inference Llama 2 in one file of pure C

C 18,873 2,394 Updated Aug 6, 2024

High-speed Large Language Model Serving for Local Deployment

C++ 8,369 448 Updated Aug 2, 2025

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,648 186 Updated Jun 25, 2024

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)

Python 2,661 91 Updated Apr 25, 2023

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Python 4,986 725 Updated Sep 22, 2025

AirLLM 70B inference with single 4GB GPU

Jupyter Notebook 6,270 480 Updated Sep 3, 2025

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

Python 771 77 Updated Dec 15, 2023

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,501 4,590 Updated Oct 25, 2025

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,145 5,211 Updated Jun 27, 2024

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

9,792 1,561 Updated Sep 8, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,835 8,306 Updated May 27, 2025

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,230 1,214 Updated Oct 22, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,027 727 Updated Oct 17, 2025

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,204 183 Updated Mar 27, 2024

4 bits quantization of LLaMA using GPTQ

Python 3,075 461 Updated Jul 13, 2024

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,975 2,220 Updated Jul 29, 2024

LLM inference in C/C++

C++ 88,297 13,430 Updated Oct 25, 2025

Port of OpenAI's Whisper model in C/C++

C++ 44,044 4,864 Updated Oct 22, 2025

A VSCode extension that allows you to use ChatGPT

TypeScript 4,973 369 Updated Sep 29, 2023
Next