-
Notifications
You must be signed in to change notification settings - Fork 492
Description
So I'm trying to finetune an SD model using some images. I want to add the 2 keywords (tokens?) pmauritania and cafrica, letting the model associate all images with both, and using mauritania
and africa
as vector points to start from when learning these words. I want to be able to prompt my model with these and have it produce results associated with these initializer words, as well as the images I'm giving it. I added both of these keywords at the beginning of the image captions, which I wrote to .txt files in the same folder, with the same name as the corresponding jpgs:
SET MODEL_NAME="C:\Users\seanrm100\Desktop\Docs\Art\StableDiffuzion\finechune\vodka.pt"
SET INSTANCE_DIR="temp"
SET OUTPUT_DIR="loras"
lora_pti ^
--pretrained_model_name_or_path=%MODEL_NAME% ^
--instance_data_dir=%INSTANCE_DIR% ^
--output_dir=%OUTPUT_DIR% ^
--train_text_encoder ^
--resolution=512 ^
--train_batch_size=1 ^
--gradient_accumulation_steps=4 ^
--scale_lr ^
--learning_rate_unet=1e-4 ^
--learning_rate_text=1e-5 ^
--learning_rate_ti=5e-4 ^
--color_jitter ^
--lr_scheduler="linear" ^
--lr_warmup_steps=0 ^
--placeholder_tokens="cafrica|pmauritania" ^
--initializer_tokens="africa|mauritania" ^
--save_steps=100 ^
--max_train_steps_ti=1000 ^
--max_train_steps_tuning=1000 ^
--perform_inversion=True ^
--clip_ti_decay ^
--weight_decay_ti=0.000 ^
--weight_decay_lora=0.001^
--continue_inversion ^
--continue_inversion_lr=1e-4 ^
--device="cuda:0" ^
--lora_rank=1 ^
@REM --use_face_segmentation_condition^
pause
I get
PTI : Placeholder Tokens ['<pmauritania>']
PTI : Initializer Tokens ['<mauritania>']
...
raise ValueError("The initializer token must be a single token.") ValueError: The initializer token must be a single token.
So I try using only 1 token
--placeholder_tokens="pmauritania" ^
--initializer_tokens="mauritania" ^
same thing.
--placeholder_tokens="" ^
--initializer_tokens="mauritania" ^
and
--placeholder_tokens="" ^
--initializer_tokens="" ^
same thing. Am I understanding this usage correctly? Documentation is lacking on how to caption the images, how exactly to use these args, etc.