A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 13,384 1,355 Updated Oct 1, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 17,195 1,883 Updated Oct 21, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 43,371 5,749 Updated Aug 16, 2024

modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,547 228 Updated Oct 30, 2025

ARM-software / ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers

C 1,209 426 Updated Apr 10, 2019

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,171 72 Updated Aug 13, 2025

ddlBoJack / Awesome-Speech-Language-Model

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

187 14 Updated Nov 10, 2024

shuaijiang / Awesome-Speech-Language-Model

Forked from ddlBoJack/Awesome-Speech-Language-Model

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

1 Updated Dec 11, 2024

shuaijiang / STRAIGHT

This is a speech analysis, modification and synthesis system

MATLAB 53 28 Updated Oct 18, 2021

ASLP-lab / DiffRhythm2

Forked from xiaomi-research/diffrhythm2

Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching

Python 82 2 Updated Nov 9, 2025

facebookresearch / WavAugment

A library for speech data augmentation in time-domain

Python 677 59 Updated Aug 30, 2021

nanobrowser / nanobrowser

Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI Operator.

TypeScript 11,257 1,128 Updated Nov 9, 2025

diet103 / claude-code-infrastructure-showcase

Examples of my Claude Code infrastructure with skill auto-activation, hooks, and agents

Shell 5,398 683 Updated Oct 31, 2025

microsoft / ai-agents-for-beginners

12 Lessons to Get Started Building AI Agents

Jupyter Notebook 44,217 14,968 Updated Nov 7, 2025

microsoft / AI

Microsoft AI

Python 2,163 602 Updated May 10, 2025

Yuan-ManX / ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

862 81 Updated Jul 8, 2025

Neutone / neutone_sdk

Join the community on Discord for more discussions around Neutone! https://discord.gg/VHSMzb8Wqp

Python 564 29 Updated Nov 2, 2025

deezer / spleeter

Deezer source separation library including pretrained models.

Python 27,734 3,048 Updated Apr 2, 2025

kyutai-labs / nanoGPTaudio

Forked from karpathy/nanoGPT

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 129 3 Updated Oct 20, 2025

evan2jiang

Lists (22)

🔮 Future ideas

TTS

主动降噪

信号基础

回声

大模型

嵌入式

房间声学

扬声器保护

效率工具

有限元

机器人音频

汇总

汇总资源

波束

测试

盲源分离

眼镜

空间音频

语音增强

音效

风噪

Starred repositories

diffusion-models