SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

A semantic–acoustic dual-stream speech codec achieving state-of-the-art performance in speech reconstruction and semantic representation across bitrates.

🛠️ Environment Setup

conda create -n sac python=3.10
conda activate sac
pip install -r requirements.txt  # pip version == 24.0

🧩 Model Checkpoints

To use SAC, you need to prepare the pretrained dependencies, including the GLM-4-Voice-Tokenizer for semantic tokenization and the ERes2Net speaker encoder for speaker feature extraction (during codec training). Make sure the corresponding model paths are correctly set in your configuration file (e.g., configs/xxx.yaml).

The following table lists the available SAC checkpoints:

Model Name	Hugging Face	Sample Rate	Token Rate	BPS
SAC	🤗 Soul-AILab/SAC-16k-37_5Hz	16 kHz	37.5 Hz	525
SAC	🤗 Soul-AILab/SAC-16k-62_5Hz	16 kHz	62.5 Hz	875

🎧 Inference

To perform audio reconstruction, you can use the following command:

python -m bins.infer

We also provide batch scripts for audio reconstruction, encoding, decoding, and embedding extraction in the scripts/batch directory as references (you can refer to the batch scripts guide for details).

🧪 Evaluation

You can run the following command to perform evaluation:

bash scripts/eval.sh

For details on dataset preparation and evaluation setup, please first refer to the evaluation guide.

🚀 Training

Step 1: Prepare training data

Before training, organize your dataset in JSONL format. You can refer to example/training_data.jsonl. Each entry should include:

utt — unique utterance ID (customizable)
wav_path — path to raw audio
ssl_path — path to offline-extracted Whisper features (for semantic supervision)
semantic_token_path — path to offline-extracted semantic tokens

To accelerate training, you need to extract semantic tokens and Whisper features offline first before starting. Refer to the feature extraction guide for detailed instructions.

Step 2: Modify configuration files

You can adjust training and DeepSpeed configurations by editing:

configs/xxx.yaml — main training configuration
configs/ds_stage2.json — DeepSpeed configuration

Step 3: Start training

Run the following script to start SAC training:

bash scripts/train.sh

🙏 Acknowledgement

Our codebase builds upon the awesome SparkVox and DAC. We thank the authors for their excellent work.

🔖 Citation

If you find this work useful in your research, please consider citing:

@article{chen2025sac,
  title={SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization},
  author={Chen, Wenxi and Wang, Xinsheng and Yan, Ruiqi and Chen, Yushen and Niu, Zhikang and Ma, Ziyang and Li, Xiquan and Liang, Yuzhe and Wen, Hanlin and Yin, Shunshun and others},
  journal={arXiv preprint arXiv:2510.16841},
  year={2025}
}

📜 License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bins		bins
configs		configs
docs		docs
example		example
models		models
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

🛠️ Environment Setup

🧩 Model Checkpoints

🎧 Inference

🧪 Evaluation

🚀 Training

Step 1: Prepare training data

Step 2: Modify configuration files

Step 3: Start training

🙏 Acknowledgement

🔖 Citation

📜 License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Uh oh!

License

Uh oh!

Soul-AILab/SAC

Folders and files

Latest commit

History

Repository files navigation

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

🛠️ Environment Setup

🧩 Model Checkpoints

🎧 Inference

🧪 Evaluation

🚀 Training

Step 1: Prepare training data

Step 2: Modify configuration files

Step 3: Start training

🙏 Acknowledgement

🔖 Citation

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages