liuhuang31

Follow

liuhuang31 liuhuang31

Follow

TTS, NLP, VC.

66 followers · 59 following

Tencent
Shenzhen, China

Achievements

Achievements

Lists (16)

Sort

ASR

Audio LLM Codec

88 repositories

Fun

42 repositories

MLLM

33 repositories

Nice Technology

NLP LLM

44 repositories

Singing synthesis

Singing vc

Speech enhancement

TTS

131 repositories

TTS Dataset

20 repositories

TTS Frontend

TTS Frontend Dataset

TTS tools

audio process, align...

60 repositories

TTS Vocoder

17 repositories

VC

10 repositories

Stars

GiantAILab / YingMusic-SVC

Official implementation of YingMusic-SVC.

Python 44 5 Updated Nov 27, 2025

roudimit / Omni-R1

[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

Python 21 1 Updated Nov 21, 2025

stepfun-ai / Step-Audio-R1

Python 295 19 Updated Nov 27, 2025

rasbt / reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook 2,117 282 Updated Nov 25, 2025

AkaliKong / MiniOneRec

Minimal reproduction of OneRec

Python 545 76 Updated Nov 26, 2025

ShawnPi233 / HQ-SVC

Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)

43 Updated Nov 18, 2025

MuyangDu / T5Voice

T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech synthesis with zero-shot capabilities.

Python 28 5 Updated Nov 7, 2025

stepfun-ai / Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 712 47 Updated Nov 28, 2025

ASLP-lab / MeanVC

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 163 8 Updated Nov 24, 2025

Andong-Li-speech / BridgeVoC

This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".

Python 56 3 Updated Nov 5, 2025

shaochenze / calm

Official implementation of "Continuous Autoregressive Language Models"

Python 638 76 Updated Nov 10, 2025

JarodMica / index-tts

Forked from index-tts/index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 88 22 Updated Nov 15, 2025

alibaba / unified-audio

An Open-Source Project to Unify Audio Processing and Generation

HTML 63 5 Updated Nov 5, 2025

kaistmm / AlignDiT

[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Python 22 2 Updated Oct 28, 2025

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,389 284 Updated Nov 27, 2025

XiaomiMiMo / MiMo-Audio-Training

Python 86 9 Updated Oct 16, 2025

yaobiao131 / downkyicore

哔哩下载姬(跨平台版)downkyi，哔哩哔哩网站视频下载工具，支持批量下载，支持8K、HDR、杜比视界，提供工具箱（音视频提取、去水印等）。

C# 5,634 411 Updated Nov 26, 2025

jjunak-yun / FLowHigh_code

[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"

Python 89 12 Updated Jan 17, 2025

hans0809 / MiniMind-in-Depth

轻量级大语言模型MiniMind的源码解读，包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程

449 42 Updated Jun 16, 2025

jisang93 / VISinger

Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (ICASSP, 2022)

Python 19 2 Updated May 12, 2023

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 21,009 1,852 Updated Oct 25, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 257 18 Updated Nov 25, 2025

GiantAILab / DiaMoE-TTS

Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"

Python 186 15 Updated Nov 28, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 37,784 4,646 Updated Nov 17, 2025

xkx-hub / KALL-E

Python 31 5 Updated Sep 25, 2025

haoweilou / ParaStyleTTS

This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation

Python 43 6 Updated Oct 24, 2025

zaigie / FunSpeech

开箱即用的本地私有化部署语音服务，快速搭建FunASR与CosyVoice2后端

Python 40 10 Updated Nov 29, 2025

neuphonic / neutts-air

On-device TTS model by Neuphonic

Python 4,100 415 Updated Nov 18, 2025

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 395 28 Updated Nov 27, 2025

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 158 11 Updated Nov 18, 2025