A Text To Speech node using Step-Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice.

中文 | English

A Text To Speech node using Step-Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice.

Update

[2025-03-21] ⚒️: Completely refactored the code, added more tunable parameters, and max_length can be adjusted according to the text length. Optional unload_model to choose whether to unload the model to accelerate inference speed.

[2025-03-07]⚒️: Custom speakers can be defined directly in ComfyUI\models\TTS\Step-Audio-speakers\speakers_info.json without the need for input in the node.

Move the Step-Audio-speakers folder from this repository to the ComfyUI\models\TTS folder. The structure is as follows:

ComfyUI\models\TTS
├── Step-Audio-Tokenizer
├── Step-Audio-speakers
├── Step-Audio-TTS-3B

You can then freely customize speakers under the ComfyUI\models\TTS\Step-Audio-speakers folder for use. Ensure that the speaker name configuration matches exactly:

[2025-03-06]⚒️: New recording node MW Audio Recorder can be used to record audio with a microphone, and the progress bar displays the recording progress:

参数名/Parameter	作用描述/Description	范围/Range	注意事項/Notes
trigger	录音触发开关 - 设为True开始录音 Recording trigger - Set to True to start recording	Boolean (True/False)	需要从False切到True才能触发 Requires changing from False to True to activate
record_sec	主录音时长（秒） Main recording duration (seconds)	1-60 (整数/integer)	实际时长 Actual duration
n_fft	FFT窗口大小（影响频率分辨率） FFT window size (affects frequency resolution)	512,1024,...,4096 (512倍数/multiplies)	值越大频率分辨率越高 Higher values give better frequency resolution
sensitivity	降噪灵敏度（值越高越激进） Noise reduction sensitivity (higher=more aggressive)	0.5-3.0 (步长0.1/step 0.1)	1.2=标准办公室环境 1.2=standard office environment
smooth	时频平滑系数（值越高越自然） Time-frequency smoothing (higher=more natural)	1,3,5,7,9,11 (奇数/odd numbers)	建议语音：5，音乐：7 Recommended: 5 for speech, 7 for music
sample_rate	采样率（影响音质与文件大小） Sampling rate (affects quality & size)	16000/44100/48000 Hz	44100=CD音质 44100=CD quality

[2025-03-02]⚒️: Add experimental custom_mark, surrounding with "()", for example (温柔)(东北话), it may have an effect.

[2025-02-25]⚒️: Support custom speaker custom_stpeaker.

Installation

cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_StepAudioTTS.git
cd ComfyUI_StepAudioTTS
pip install -r requirements.txt

# python_embeded
./python_embeded/python.exe -m pip install -r requirements.txt

Model Download

Download to the ComfyUI\models\TTS folder

Huggingface

Models	Links
Step-Audio-Tokenizer	🤗huggingface
Step-Audio-TTS-3B	🤗huggingface

Modelscope

Models	Links
Step-Audio-Tokenizer	modelscope
Step-Audio-TTS-3B	modelscope

Supports Chinese, English, Korean, Japanese, Sichuanese, Cantonese etc.

Acknowledgements

Part of the code for this project comes from:

Thank you to all the open-source projects for their contributions to this project!

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
Step-Audio-speakers		Step-Audio-speakers
assets		assets
cosyvoice		cosyvoice
funasr_detach		funasr_detach
.gitignore		.gitignore
LICENSE		LICENSE
MWAudioRecorder.py		MWAudioRecorder.py
README-CN.md		README-CN.md
README.md		README.md
StepAudioTTS.py		StepAudioTTS.py
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sa_utils.py		sa_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Text To Speech node using Step-Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice.

Update

Installation

Model Download

Huggingface

Modelscope

Supports Chinese, English, Korean, Japanese, Sichuanese, Cantonese etc.

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

cowperc/ComfyUI_StepAudioTTS

Folders and files

Latest commit

History

Repository files navigation

A Text To Speech node using Step-Audio-TTS in ComfyUI. Can speak, rap, sing, or clone voice.

Update

Installation

Model Download

Huggingface

Modelscope

Supports Chinese, English, Korean, Japanese, Sichuanese, Cantonese etc.

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages