-
Inner Mongolia University, China
- Hohhot
-
16:03
(UTC -12:00)
Highlights
- Pro
-
ThinkSound Public
Forked from FunAudioLLM/ThinkSound[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
Python UpdatedSep 19, 2025 -
Step-Audio2 Public
Forked from stepfun-ai/Step-Audio2Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Python Apache License 2.0 UpdatedSep 1, 2025 -
acad-homepage.github.io Public
Forked from RayeRen/acad-homepage.github.ioAcadHomepage: A Modern and Responsive Academic Personal Homepage
SCSS MIT License UpdatedAug 9, 2025 -
Kimi-Audio Public
Forked from MoonshotAI/Kimi-AudioKimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Python UpdatedApr 28, 2025 -
speech-to-speech Public
Forked from huggingface/speech-to-speechSpeech To Speech: an effort for an open-sourced and modular GPT4-o
Python Apache License 2.0 UpdatedApr 15, 2025 -
-
VoiceBench Public
Forked from MatthewCYM/VoiceBenchVoiceBench: Benchmarking LLM-Based Voice Assistants
Python Apache License 2.0 UpdatedFeb 5, 2025 -
-
SenseVoice Public
Forked from FunAudioLLM/SenseVoiceMultilingual Voice Understanding Model
Python Other UpdatedJan 8, 2025 -
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Python Apache License 2.0 UpdatedSep 29, 2024 -
VoiceLDM Public
Forked from glory20h/VoiceLDMVoiceLDM: Text-to-Speech with Environmental Context
Python Apache License 2.0 UpdatedAug 9, 2024 -
-
MERTools Public
Forked from zeroQiaoba/MERToolsToolkits for Multimodal Emotion Recognition
-
Diff-BGM Public
Forked from sizhelee/Diff-BGMofficial code for CVPR'24 paper Diff-BGM
-
novel-view-acoustic-synthesis Public
Forked from facebookresearch/novel-view-acoustic-synthesisCode for Novel View Acoustic Synthesis paper
Python Other UpdatedAug 14, 2023 -
DiffSinger Public
Forked from MoonInTheRiver/DiffSingerDiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
-
ProDiff Public
Forked from Rongjiehuang/ProDiffPyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline
-
NATSpeech Public
Forked from NATSpeech/NATSpeechA Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
-
visual-acoustic-matching Public
Forked from facebookresearch/visual-acoustic-matchingRepo for Visual Acoustic Matching, CVPR 2022
Python Other UpdatedFeb 28, 2023 -
MUSIC-AVQA Public
Forked from GeWu-Lab/MUSIC-AVQAMUSIC-AVQA, CVPR2022 (ORAL)
Python MIT License UpdatedDec 30, 2022 -