WendongGan

🎯

Focusing

Wendong Gan WendongGan

🎯

Focusing

Research interests: Speech Algorithm, LLM, NLP

77 followers · 465 following

UESTC
Chengdu,China

Easy-Turn Public
Forked from ASLP-lab/Easy-Turn

Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems

Python Apache License 2.0 Updated Oct 12, 2025
Ming-UniAudio Public
Forked from inclusionAI/Ming-UniAudio

Python MIT License Updated Oct 4, 2025
MiMo-Audio Public
Forked from XiaomiMiMo/MiMo-Audio

Python Apache License 2.0 Updated Sep 20, 2025
train-higgs-audio-jimmyMa99 Public
Forked from JimmyMa99/train-higgs-audio

Text-audio foundation model from Boson AI

Python Updated Sep 4, 2025
WenetSpeech-Yue Public
Forked from ASLP-lab/WenetSpeech-Yue

A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

Python Apache License 2.0 Updated Sep 4, 2025
mair-hub Public
Forked from nvidia-china-sae/mair-hub

Jupyter Notebook Apache License 2.0 Updated Aug 29, 2025
CarelessWhisper-Streaming Public
Forked from tomer9080/CarelessWhisper-Streaming

Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.

Python Other Updated Aug 21, 2025
wavesurfer Public
Forked from pengzhendong/wavesurfer

For audio visualization and playback in Jupyter notebooks.

Python BSD 2-Clause "Simplified" License Updated Aug 14, 2025
FluidAudio Public
Forked from FluidInference/FluidAudio

Fully Native Swift and CoreML. Efficient Speaker Diarization, VAD, and Speech-to-Text for realtime workloads

Swift Apache License 2.0 Updated Aug 14, 2025
Cosyvoice_DPO_NOTES Public
Forked from ScottishFold007/Cosyvoice_DPO_NOTES

CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!

Python Updated Aug 8, 2025
happy-llm Public
Forked from datawhalechina/happy-llm

📚 从零开始的大语言模型原理与实践教程

Jupyter Notebook Other Updated Jul 19, 2025
fireredasr-streaming Public
Forked from xphh/fireredasr-streaming

low-latency realtime ASR based on FireRedASR

Python MIT License Updated Jul 8, 2025
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python Apache License 2.0 Updated Jun 30, 2025
finetune-index-tts Public
Forked from yrom/finetune-index-tts

IndexTTS Fine-tuning notebooks

Jupyter Notebook MIT License Updated Jun 17, 2025
icefall Public
Forked from k2-fsa/icefall

Python Apache License 2.0 Updated May 27, 2025
GenVC Public
Forked from caizexin/GenVC

Self-supervised Generative LM-based Voice Conversion

Python MIT License Updated Apr 16, 2025
async_cosyvoice Public
Forked from qi-hua/async_cosyvoice

使用vllm加速cosyvoice2的推理

Jupyter Notebook Apache License 2.0 Updated Apr 13, 2025
audioseal Public
Forked from facebookresearch/audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python MIT License Updated Mar 27, 2025
CFPRF Public
Forked from ItzJuny/CFPRF

[ACM MM'24] Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization

Python MIT License Updated Dec 20, 2024
WavChat Public
Forked from jishengpeng/WavChat

A Survey of Spoken Dialogue Models (60 pages)

Updated Nov 12, 2024
minimind Public
Forked from jingyaogong/minimind

「大模型」3小时完全从0训练26M的小参数GPT，个人显卡即可推理训练！

Python Apache License 2.0 Updated Nov 10, 2024
litgpt Public
Forked from Lightning-AI/litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python Apache License 2.0 Updated Nov 1, 2024
scoreq Public
Forked from alessandroragano/scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python Updated Oct 18, 2024
F5-TTS Public
Forked from SWivid/F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python MIT License Updated Oct 10, 2024
GTSinger Public
Forked from AaronZ345/GTSinger

Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Python Other Updated Oct 10, 2024
mamba-diarization Public
Forked from nttcslab-sp/mamba-diarization

Official repository for Mamba-based Segmentation Model for Speaker Diarization

Python Other Updated Oct 10, 2024
reverb Public
Forked from revdotcom/reverb

Open source inference code for Rev's model

Python Other Updated Oct 7, 2024
SLAM-LLM Public
Forked from X-LANCE/SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python MIT License Updated Oct 5, 2024
SSR-Speech Public
Forked from WangHelin1997/SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python MIT License Updated Sep 22, 2024
TTS-arxiv-daily Public
Forked from liutaocode/TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python Apache License 2.0 Updated Sep 22, 2024

Wendong Gan WendongGan

Easy-Turn Public

Uh oh!

Ming-UniAudio Public

Uh oh!

MiMo-Audio Public

Uh oh!

train-higgs-audio-jimmyMa99 Public

Uh oh!

WenetSpeech-Yue Public

Uh oh!

mair-hub Public

Uh oh!

CarelessWhisper-Streaming Public

Uh oh!

wavesurfer Public

Uh oh!

FluidAudio Public

Uh oh!

Cosyvoice_DPO_NOTES Public

Uh oh!

happy-llm Public

Uh oh!

fireredasr-streaming Public

Uh oh!

CosyVoice Public

Uh oh!

finetune-index-tts Public

Uh oh!

icefall Public

Uh oh!

GenVC Public

Uh oh!

async_cosyvoice Public

Uh oh!

audioseal Public

Uh oh!

CFPRF Public

Uh oh!

WavChat Public

Uh oh!

minimind Public

Uh oh!

litgpt Public

Uh oh!

scoreq Public

Uh oh!

F5-TTS Public

Uh oh!

GTSinger Public

Uh oh!

mamba-diarization Public

Uh oh!

reverb Public

Uh oh!

SLAM-LLM Public

Uh oh!

SSR-Speech Public

Uh oh!

TTS-arxiv-daily Public

Uh oh!