-
A ComfyUI node that allows you to select Flash Attention Triton implementation as sampling attention.
-
llama.cpp Public
Forked from ggml-org/llama.cppPort of Facebook's LLaMA model in C/C++
C MIT License UpdatedJul 29, 2024 -
exllamav2 Public
Forked from turboderp-org/exllamav2A fast inference library for running LLMs locally on modern consumer-class GPUs
Python MIT License UpdatedNov 10, 2023 -
whisper.cpp Public
Forked from ggml-org/whisper.cppPort of OpenAI's Whisper model in C/C++
C MIT License UpdatedAug 25, 2023 -
exllama Public
Forked from turboderp/exllamaA more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
text-generation-webui Public
Forked from oobabooga/text-generation-webuiA gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
Python GNU Affero General Public License v3.0 UpdatedJun 28, 2023