-
Westlake University
- Hangzhou, China
- saoyear.github.io
Stars
Official PyTorch implementation of 'Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain'
Extracted YouTube 8M URLs and Labels without all the TF Record parsing/features
Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
Conformer-based Metric GAN for speech enhancement
Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.
This is the official implementation of the LiSenNet
Download AudioSet for Vision-Audio-Text Pre-training
Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".
Official PyTorch implementation of 'VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification' [IEEE TASLP]
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Pytorch implementation of a cosine annealing learning scheduler with linear warmup
A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a ca…
MOS score prediction by fine-tuned wav2vec2.0 model
Generation scripts for EARS-WHAM and EARS-Reverb
UT-Sarulab MOS prediction system using SSL models
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
A generative speech model for daily dialogue.
Baseline code for DCASE 2023 task 4 B