Shuwei He he-shuwei

🌴

On vacation

email: [email protected] ; [email protected]

3 followers · 20 following

Inner Mongolia University, China
Hohhot
16:03 (UTC -12:00)

Highlights

ThinkSound Public
Forked from FunAudioLLM/ThinkSound

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python Updated Sep 19, 2025
Step-Audio2 Public
Forked from stepfun-ai/Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python Apache License 2.0 Updated Sep 1, 2025
acad-homepage.github.io Public
Forked from RayeRen/acad-homepage.github.io

AcadHomepage: A Modern and Responsive Academic Personal Homepage

SCSS MIT License Updated Aug 9, 2025
Kimi-Audio Public
Forked from MoonshotAI/Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python Updated Apr 28, 2025
speech-to-speech Public
Forked from huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python Apache License 2.0 Updated Apr 15, 2025
Step-Audio Public
Forked from stepfun-ai/Step-Audio

Python Apache License 2.0 Updated Feb 18, 2025
VoiceBench Public
Forked from MatthewCYM/VoiceBench

VoiceBench: Benchmarking LLM-Based Voice Assistants

Python Apache License 2.0 Updated Feb 5, 2025
M2SE-VTTS Public

PyTorch Implementation of M2SE-VTTS (AAAI'25).

2 Updated Jan 21, 2025
SenseVoice Public
Forked from FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

Python Other Updated Jan 8, 2025
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python Apache License 2.0 Updated Sep 29, 2024
VoiceLDM Public
Forked from glory20h/VoiceLDM

VoiceLDM: Text-to-Speech with Environmental Context

Python Apache License 2.0 Updated Aug 9, 2024
S2Lab_MOS Public
Forked from Coder-jzq/S2Lab_MOS

jiazhenqi

Vue 2 Updated Jul 9, 2024
MERTools Public
Forked from zeroQiaoba/MERTools

Toolkits for Multimodal Emotion Recognition

Python 1 Updated May 7, 2024
Diff-BGM Public
Forked from sizhelee/Diff-BGM

official code for CVPR'24 paper Diff-BGM

Python 1 Updated Mar 28, 2024
novel-view-acoustic-synthesis Public
Forked from facebookresearch/novel-view-acoustic-synthesis

Code for Novel View Acoustic Synthesis paper

Python Other Updated Aug 14, 2023
DiffSinger Public
Forked from MoonInTheRiver/DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Python 1 MIT License Updated May 2, 2023
ProDiff Public
Forked from Rongjiehuang/ProDiff

PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline

Python 1 MIT License Updated Apr 19, 2023
NATSpeech Public
Forked from NATSpeech/NATSpeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Python 1 MIT License Updated Apr 2, 2023
visual-acoustic-matching Public
Forked from facebookresearch/visual-acoustic-matching

Repo for Visual Acoustic Matching, CVPR 2022

Python Other Updated Feb 28, 2023
MUSIC-AVQA Public
Forked from GeWu-Lab/MUSIC-AVQA

MUSIC-AVQA, CVPR2022 (ORAL)

Python MIT License Updated Dec 30, 2022
python-MCD Public
Forked from ttslr/python-MCD

Python Updated May 3, 2020

Shuwei He he-shuwei

Highlights

ThinkSound Public

Uh oh!

Step-Audio2 Public

Uh oh!

acad-homepage.github.io Public

Uh oh!

Kimi-Audio Public

Uh oh!

speech-to-speech Public

Uh oh!

Step-Audio Public

Uh oh!

VoiceBench Public

Uh oh!

M2SE-VTTS Public

Uh oh!

SenseVoice Public

Uh oh!

CosyVoice Public

Uh oh!

VoiceLDM Public

Uh oh!

S2Lab_MOS Public

Uh oh!

MERTools Public

Uh oh!

Diff-BGM Public

Uh oh!

novel-view-acoustic-synthesis Public

Uh oh!

DiffSinger Public

Uh oh!

ProDiff Public

Uh oh!

NATSpeech Public

Uh oh!

visual-acoustic-matching Public

Uh oh!

MUSIC-AVQA Public

Uh oh!

python-MCD Public

Uh oh!