Musubi Tuner

Introduction

This repository provides scripts for training LoRA (Low-Rank Adaptation) models with HunyuanVideo, Wan2.1/2.2, FramePack, FLUX.1 Kontext, and Qwen-Image architectures.

This repository is unofficial and not affiliated with the official HunyuanVideo/Wan2.1/2.2/FramePack/FLUX.1 Kontext/Qwen-Image repositories.

This repository is under development.

Support the Project

If you find this project helpful, please consider supporting its development via GitHub Sponsors. Your support is greatly appreciated!

Recent Updates

GitHub Discussions Enabled: We've enabled GitHub Discussions for community Q&A, knowledge sharing, and technical information exchange. Please use Issues for bug reports and feature requests, and Discussions for questions and sharing experiences. Join the conversation →

November 2, 2025
- Added --use_pinned_memory_for_block_swap option to each training script and improved the block swap process itself. See PR #700.
  - When specified, this option uses pinned memory for block swap offloading. This may improve block swap performance. However, on Windows environments, it increases shared GPU memory usage. Please refer to the documentation for details.
  - Since in some environments it may be faster not to specify --use_pinned_memory_for_block_swap, please try both options.
October 26, 2025
- Fixed a bug in Qwen-Image training where attention calculations were incorrect when the batch size was 2 or more and --split_attn was not specified. See PR #688.
- Added --disable_numpy_memmap option to Wan, FramePack, and Qwen-Image training and inference scripts. Thank you FurkanGozukara for PR #681. Also see PR #687.
  - When specified, this option disables numpy memory mapping during model loading. This may speed up model loading in some environments (e.g., RunPod), but increases RAM usage.
October 25, 2025
- Fixed a bug in image datasets with control images where the combination of target and control images was not loaded correctly. See PR #684.
  - If you are using an image dataset with control images, please recreate the latent cache.
  - Since only the first match was used for judgment, when the target images were a.png and ab.png, and the control images were a_1.png and ab_1.png, both a_1.png and ab_1.png were combined with a.png.
October 13, 2025
- Added Reference Consistency Mask (RCM) feature to Qwen-Image-Edit, 2509 inference script to improve pixel-level consistency of generated images. See PR #643
  - RCM addresses the issue of slight positional drift in generated images compared to the control image. For details, refer to the Qwen-Image documentation.
- Fixed a bug where the control image was being resized to match the output image size even when the --resize_control_to_image_size option was not specified. This may change the generated images, so please check your options.
- FramePack 1-frame inference now includes the --one_frame_auto_resize option. PR #646
  - Automatically adjusts the resolution of the generated image. This option is only effective when --one_frame_inference is specified. For details, refer to the FramePack 1-frame inference documentation.

Releases

We are grateful to everyone who has been contributing to the Musubi Tuner ecosystem through documentation and third-party tools. To support these valuable contributions, we recommend working with our releases as stable reference points, as this project is under active development and breaking changes may occur.

You can find the latest release and version history in our releases page.

For Developers Using AI Coding Agents

This repository provides recommended instructions to help AI agents like Claude and Gemini understand our project context and coding standards.

To use them, you need to opt-in by creating your own configuration file in the project root.

Quick Setup:

Create a CLAUDE.md and/or GEMINI.md file in the project root.
Add the following line to your CLAUDE.md to import the repository's recommended prompt (currently they are the almost same):
```
@./.ai/claude.prompt.md
```
or for Gemini:
```
@./.ai/gemini.prompt.md
```
You can now add your own personal instructions below the import line (e.g., Always respond in Japanese.).

This approach ensures that you have full control over the instructions given to your agent while benefiting from the shared project context. Your CLAUDE.md and GEMINI.md are already listed in .gitignore, so it won't be committed to the repository.

Overview

Hardware Requirements

VRAM: 12GB or more recommended for image training, 24GB or more for video training
- Actual requirements depend on resolution and training settings. For 12GB, use a resolution of 960x544 or lower and use memory-saving options such as --blocks_to_swap, --fp8_llm, etc.
Main Memory: 64GB or more recommended, 32GB + swap may work

Features

Memory-efficient implementation
Windows compatibility confirmed (Linux compatibility confirmed by community)
Multi-GPU training (using Accelerate), documentation will be added later

Documentation

For detailed information on specific architectures, configurations, and advanced features, please refer to the documentation below.

Architecture-specific:

Common Configuration & Usage:

Installation

pip based installation

Python 3.10 or later is required (verified with 3.10).

Create a virtual environment and install PyTorch and torchvision matching your CUDA version.

PyTorch 2.5.1 or later is required (see note).

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

Install the required dependencies using the following command.

pip install -e .

Optionally, you can use FlashAttention and SageAttention (for inference only; see SageAttention Installation for installation instructions).

Optional dependencies for additional features:

ascii-magic: Used for dataset verification
matplotlib: Used for timestep visualization
tensorboard: Used for logging training progress
prompt-toolkit: Used for interactive prompt editing in Wan2.1 and FramePack inference scripts. If installed, it will be automatically used in interactive mode. Especially useful in Linux environments for easier prompt editing.

pip install ascii-magic matplotlib tensorboard prompt-toolkit

uv based installation (experimental)

You can also install using uv, but installation with uv is experimental. Feedback is welcome.

Install uv (if not already present on your OS).

Linux/MacOS

curl -LsSf https://astral.sh/uv/install.sh | sh

Follow the instructions to add the uv path manually until you restart your session...

Windows

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Follow the instructions to add the uv path manually until you reboot your system... or just reboot your system at this point.

Model Download

Model download procedures vary by architecture. Please refer to the architecture-specific documents in the Documentation section for instructions.

Usage

Dataset Configuration

Please refer to here.

Pre-caching

Pre-caching procedures vary by architecture. Please refer to the architecture-specific documents in the Documentation section for instructions.

Configuration of Accelerate

Run accelerate config to configure Accelerate. Choose appropriate values for each question based on your environment (either input values directly or use arrow keys and enter to select; uppercase is default, so if the default value is fine, just press enter without inputting anything). For training with a single GPU, answer the questions as follows:

- In which compute environment are you running?: This machine
- Which type of machine are you using?: No distributed training
- Do you want to run your training on CPU only (even if a GPU / Apple Silicon / Ascend NPU device is available)?[yes/NO]: NO
- Do you wish to optimize your script with torch dynamo?[yes/NO]: NO
- Do you want to use DeepSpeed? [yes/NO]: NO
- What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]: all
- Would you like to enable numa efficiency? (Currently only supported on NVIDIA hardware). [yes/NO]: NO
- Do you wish to use mixed precision?: bf16

Note: In some cases, you may encounter the error ValueError: fp16 mixed precision requires a GPU. If this happens, answer "0" to the sixth question (What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:). This means that only the first GPU (id 0) will be used.

Training and Inference

Training and inference procedures vary significantly by architecture. Please refer to the architecture-specific documents in the Documentation section and the various configuration documents for detailed instructions.

Miscellaneous

SageAttention Installation

sdbsd has provided a Windows-compatible SageAttention implementation and pre-built wheels here: https://github.com/sdbds/SageAttention-for-windows. After installing triton, if your Python, PyTorch, and CUDA versions match, you can download and install the pre-built wheel from the Releases page. Thanks to sdbsd for this contribution.

For reference, the build and installation instructions are as follows. You may need to update Microsoft Visual C++ Redistributable to the latest version.

Download and install triton 3.1.0 wheel matching your Python version from here.
Install Microsoft Visual Studio 2022 or Build Tools for Visual Studio 2022, configured for C++ builds.
Clone the SageAttention repository in your preferred directory:
```
git clone https://github.com/thu-ml/SageAttention.git
```
Open x64 Native Tools Command Prompt for VS 2022 from the Start menu under Visual Studio 2022.
Activate your venv, navigate to the SageAttention folder, and run the following command. If you get a DISTUTILS not configured error, set set DISTUTILS_USE_SDK=1 and try again:
```
python setup.py install
```

This completes the SageAttention installation.

PyTorch version

If you specify torch for --attn_mode, use PyTorch 2.5.1 or later (earlier versions may result in black videos).

If you use an earlier version, use xformers or SageAttention.

Disclaimer

This repository is unofficial and not affiliated with the official repositories of the supported architectures.

This repository is experimental and under active development. While we welcome community usage and feedback, please note:

This is not intended for production use
Features and APIs may change without notice
Some functionalities are still experimental and may not work as expected
Video training features are still under development

If you encounter any issues or bugs, please create an Issue in this repository with:

A detailed description of the problem
Steps to reproduce
Your environment details (OS, GPU, VRAM, Python version, etc.)
Any relevant error messages or logs

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

Code under the hunyuan_model directory is modified from HunyuanVideo and follows their license.

Code under the wan directory is modified from Wan2.1. The license is under the Apache License 2.0.

Code under the frame_pack directory is modified from FramePack. The license is under the Apache License 2.0.

Other code is under the Apache License 2.0. Some code is copied and modified from Diffusers.

Name		Name	Last commit message	Last commit date
Latest commit History 906 Commits
.ai		.ai
.github		.github
docs		docs
images		images
src/musubi_tuner		src/musubi_tuner
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.ja.md		CONTRIBUTING.ja.md
CONTRIBUTING.md		CONTRIBUTING.md
README.ja.md		README.ja.md
README.md		README.md
cache_latents.py		cache_latents.py
cache_text_encoder_outputs.py		cache_text_encoder_outputs.py
caption_images_by_qwen_vl.py		caption_images_by_qwen_vl.py
convert_lora.py		convert_lora.py
flux_kontext_cache_latents.py		flux_kontext_cache_latents.py
flux_kontext_cache_text_encoder_outputs.py		flux_kontext_cache_text_encoder_outputs.py
flux_kontext_generate_image.py		flux_kontext_generate_image.py
flux_kontext_train_network.py		flux_kontext_train_network.py
fpack_cache_latents.py		fpack_cache_latents.py
fpack_cache_text_encoder_outputs.py		fpack_cache_text_encoder_outputs.py
fpack_generate_video.py		fpack_generate_video.py
fpack_train_network.py		fpack_train_network.py
hv_generate_video.py		hv_generate_video.py
hv_train.py		hv_train.py
hv_train_network.py		hv_train_network.py
lora_post_hoc_ema.py		lora_post_hoc_ema.py
merge_lora.py		merge_lora.py
pyproject.toml		pyproject.toml
qwen_extract_lora.py		qwen_extract_lora.py
qwen_image_cache_latents.py		qwen_image_cache_latents.py
qwen_image_cache_text_encoder_outputs.py		qwen_image_cache_text_encoder_outputs.py
qwen_image_generate_image.py		qwen_image_generate_image.py
qwen_image_train.py		qwen_image_train.py
qwen_image_train_network.py		qwen_image_train_network.py
wan_cache_latents.py		wan_cache_latents.py
wan_cache_text_encoder_outputs.py		wan_cache_text_encoder_outputs.py
wan_generate_video.py		wan_generate_video.py
wan_train_network.py		wan_train_network.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Musubi Tuner

Table of Contents

Introduction

Sponsors

Support the Project

Recent Updates

Releases

For Developers Using AI Coding Agents

Overview

Hardware Requirements

Features

Documentation

Installation

pip based installation

uv based installation (experimental)

Linux/MacOS

Windows

Model Download

Usage

Dataset Configuration

Pre-caching

Configuration of Accelerate

Training and Inference

Miscellaneous

SageAttention Installation

PyTorch version

Disclaimer

Contributing

License

About

Uh oh!

Releases

Packages

Languages

dxqb/musubi-tuner

Folders and files

Latest commit

History

Repository files navigation

Musubi Tuner

Table of Contents

Introduction

Sponsors

Support the Project

Recent Updates

Releases

For Developers Using AI Coding Agents

Overview

Hardware Requirements

Features

Documentation

Installation

pip based installation

uv based installation (experimental)

Linux/MacOS

Windows

Model Download

Usage

Dataset Configuration

Pre-caching

Configuration of Accelerate

Training and Inference

Miscellaneous

SageAttention Installation

PyTorch version

Disclaimer

Contributing

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages