Skip to content

Code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

License

Notifications You must be signed in to change notification settings

limei1221/F5R-TTS

 
 

Repository files navigation

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization

This is a simplified implementation of F5R-TTS based on the paper F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization, intended for learning purposes.


Fig 1: The architecture of backbone.


Fig 2: The pipeline of GRPO phase.

Installation

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5r-tts python=3.10
conda activate f5r-tts
pip install -r requirements.txt

Inference

python ./src/f5_tts/infer/infer_cli.py \
  --model F5TTS_v1_Base \
  --ckpt_file "your_model_path" \
  --ref_audio "path_to_reference.wav" \
  --ref_text "reference_text" \
  --gen_text "generated_text" \
  --output_dir ./tests

Training

You need to download wespeaker pretrained model and put it under src/rl/wespeaker/multilingual directory for GRPO phase.

accelerate config

# Data preparing
python src/f5_tts/train/datasets/prepare_libritts.py

# Pretraining phase
accelerate launch src/f5_tts/train/train.py

# GRPO phase
accelerate launch src/f5_tts/train/train_rl.py

About

Code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%