Skip to content

Releases: oobabooga/text-generation-webui

v3.18

19 Nov 14:04
1afe082

Choose a tag to compare

Changes

  • Add --cpu-moe flag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage.
  • Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR oobabooga/llama-cpp-binaries#7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
  • Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.17

06 Nov 03:39
9ad9afa

Choose a tag to compare

Changes

  • Add weights_only=True to torch.load in Training_PRO for better security.

Bug fixes

  • Pin huggingface-hub to 0.36.0 to fix manual venv installs.
  • fix: Rename 'evaluation_strategy' to 'eval_strategy' in training. Thanks, @inyourface34456.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.16

23 Oct 15:50
fc67e5e

Choose a tag to compare

Changes

  • Make it possible to run a portable Web UI build via a symlink (#7277). Thanks, @reksar.

Bug fixes

  • Fixed python requirements for apple devices with macos tahoe (#7273). Thanks, @drieschel.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.15

15 Oct 20:15
7711305

Choose a tag to compare

Changes

  • log error when llama-server request exceeds context size (#7263). Thanks, @mamei16.
  • Make --trust-remote-code immutable from the UI/API for better security.

Bug fixes

  • Fix metadata leaking into branched chats.
  • Fix "continue" missing an initial space in chat-instruct/chat modes.
  • Fix resuming incomplete downloads after HF moved to Xet.
  • Revert exllamav3_hf changes in v3.14 that made it output gibberish.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.14

10 Oct 13:47
7833650

Choose a tag to compare

Changes

  • Better handle multi-GPU setups when using Transformers with bitsandbytes (load-in-8bit and load-in-4bit).
  • Implement the /v1/internal/logits endpoint for the exllamav3 and exllamav3_hf loaders.
  • Make profile picture uploading safer.
  • Add fla to the requirements for Exllamav3 to support qwen3-next models.

Bug fixes

  • Fix an issue with loading certain chat histories in Instruct mode. Thanks, @Remowylliams.
  • Fix portable builds for macOS x86 missing llama.cpp binaries (#7238). Thanks, @IonoclastBrigham.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.13

21 Sep 04:19
042b828

Choose a tag to compare

Bug fixes

  • Don't use $ $ for LaTeX, only $$ $$, to avoid broken rendering of text like apples cost $1, oranges cost $2
  • Fix exllamav3 ignoring the stop button
  • Fix a transformers issue when using --bf16 and Flash Attention 2 (#7217). Thanks, @stevenxdavis.
  • Fix x86_64 macos portable builds containing arm64 files

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.12

02 Sep 19:55
d3a7710

Choose a tag to compare

Changes

  • Characters can now think in chat-instruct mode! This was possible thanks to many simplifications and improvements to jinja2 template handling:
  • Add support for the Seed-OSS-36B-Instruct template.
  • Better handle the growth of the chat input textarea:
Before After
before after
  • Make the --model flag work with absolute paths for gguf models, like --model /tmp/gemma-3-270m-it-IQ4_NL.gguf
  • Make venv portable installs work with Python 3.13
  • Optimize LaTeX rendering during streaming for long replies
  • Give streaming instruct messages more vertical space
  • Preload the instruct and chat fonts for smoother startup
  • Improve right sidebar borders in light mode
  • Remove the --flash-attn flag (it's always on now in llama.cpp)
  • Suppress "Attempted to select a non-interactive or hidden tab" console warnings, reducing the UI CPU usage during streaming
  • Statically link MSVC runtime to remove the Visual C++ Redistributable dependency on Windows for the llama.cpp binaries
  • Make the llama.cpp terminal output with --verbose less verbose

Bug fixes

  • llama.cpp: Fix stderr deadlock while loading some models
  • llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
  • Fix the UI failing to launch if the Notebook prompt is too long
  • Fix LaTeX rendering for equations with asterisks
  • Fix italic and quote colors in headings

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.11

19 Aug 14:52
cb00db1

Choose a tag to compare

Changes

  • Add the Tensor Parallelism option to the ExLlamav3/ExLlamav3_HF loaders through the --enable-tp and --tp-backend options.
  • Set multimodal status during Model Loading instead of checking every generation (#7199). Thanks, @altoiddealer.
  • Improve the multimodal API examples slightly.

Bug fixes

  • Make web search functional again
  • mtmd: Fix a bug when "include past attachments" is unchecked
  • Fix code blocks having an extra empty line in the UI

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.10 - Multimodal support!

12 Aug 21:18
6c2fdfd

Choose a tag to compare

See the Multimodal Tutorial

print6

Changes

  • Add multimodal support to the UI and API
  • Add speculative decoding to the new ExLlamaV3 loader.
  • Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models, since it supports multimodal and speculative decoding.
  • Support loading chat templates from chat_template.json files (EXL3/EXL2/Transformers models)
  • Default max_tokens to 512 in the API instead of 16
  • Better organize the right sidebar in the UI
  • llama.cpp: Pass --swa-full to llama-server when streaming-llm is checked to make it work for models with SWA.

Bug fixes

  • Fix getting the ctx-size for newer EXL3/EXL2/Transformers models
  • Fix the exllamav2 loader ignoring add_bos_token
  • Fix the color of italic text in chat messages
  • Fix edit window and buttons in Messenger theme (#7100). Thanks @mykeehu.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.9.1

07 Aug 03:33
88ba4b1

Choose a tag to compare

Changes


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.