Skip to content
View DonkeyHang's full-sized avatar

Block or report DonkeyHang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience

TypeScript 91,411 6,699 Updated Jan 10, 2026

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 202 14 Updated Jan 8, 2026

A 10000+ hours dataset for Chinese speech recognition

Shell 583 51 Updated Jan 9, 2026

https://hf.co/hexgrad/Kokoro-82M

JavaScript 5,292 601 Updated Aug 6, 2025

Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.

Shell 89 10 Updated Jul 4, 2025

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,706 238 Updated Dec 8, 2025

[CVPR 2025] "DiC: Rethinking Conv3x3 Designs in Diffusion Models", a performant & speedy Conv3x3 diffusion model.

Python 240 20 Updated Jun 12, 2025

[Interspeech 2025] DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec

Jupyter Notebook 59 8 Updated Dec 24, 2025

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,253 106 Updated Mar 2, 2025

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,688 167 Updated Dec 5, 2025

Android NDK samples with Android Studio

C++ 10,443 4,253 Updated Oct 3, 2025

Official repository for FlowSE (Interspeech 2025)

JavaScript 84 9 Updated Jul 9, 2025

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python 862 132 Updated Dec 24, 2025

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Python 1,095 186 Updated Jan 5, 2026

TTS with kokoro and onnx runtime

Python 2,330 239 Updated Dec 22, 2025

[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 4,165 453 Updated Aug 5, 2025

Audio-FLAN

Jupyter Notebook 160 5 Updated Sep 23, 2025

This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …

Python 1,221 164 Updated Dec 17, 2025

Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"

Python 3,170 281 Updated Jan 8, 2026

dog-can-sing-song

Python 46 5 Updated Jan 9, 2026

Noise supression using deep filtering

Python 40 8 Updated Aug 20, 2025

Limiter, compressor, convolver, equalizer and auto volume and many other plugins for PipeWire applications

HTML 8,646 331 Updated Jan 10, 2026

mnn asr demo.

C++ 23 2 Updated Mar 24, 2025

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

Python 77 12 Updated Jan 11, 2026

LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

91 3 Updated Dec 28, 2024

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 856 94 Updated Oct 10, 2025

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 5,931 633 Updated Dec 27, 2025

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 36,882 4,379 Updated Jan 7, 2026

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,818 310 Updated Aug 14, 2025

PyTorch Implementation of TCSinger(EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Python 370 44 Updated Oct 7, 2025
Next