-
National Taiwan University
- Taiwan
- https://scholar.google.com.tw/citations?user=w5F00dYAAAAJ&hl=zh-TW
Highlights
- Pro
Lists (31)
Sort Name ascending (A-Z)
attention
audio preprocess
Audio-Vusial
AVSE3
Bias
Challenge
codec
Cuda
cv
Dataset
deepfake
ECG
energy_efficient_streamin_SE
addtion is all you need for energy-efficient streaming speech enhancementGNN
knowledge distillation
leetcode
LLM
mamba
multichannel
Optimal Transport
papper reading
pytorch-study
speech assessment
Speech enhancement
Speech Separation
SSL
text&speech
TTS
urgent-Challenge2026
vocal burst
实验记录工具
Stars
Trainging, inference, and testing of the SAC speech codec model.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
LongCat Audio Tokenizer and Detokenizer
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Python - 100天从新手到大师
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Data Pipeline, Models, and Benchmark for Omni-Captioner.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"
A complete computer science study plan to become a software engineer.
HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.
Official baseline for ICASSP 2026 URGENT Challenge Track 2 (Speech Quality Assessment)
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
This repository provides a benchmark for prompt injection attacks and defenses
(ICASSP 2025, official code)FlowSE: Flow Matching-based Speech Enhancement
Extract phoneme-level timestamps from speeh audio.
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
Official implementation of the paper "Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition"