Stars
The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these factors with real speech and noise datasets.
Analyzing and Improving Speaker Similarity Assessment in Speech Synthesis
Speech Human Evaluation Estimation Toolkit (SHEET)
Evaluation code for the Interspeech publication "Towards Frame-level Quality Predictions of Synthetic Speech". Evaluate frame-level representations of MOS predictors.
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
A TTS model capable of generating ultra-realistic dialogue in one pass.
A python package to build AI-powered real-time audio applications
A Conversational Speech Generation Model
Unified automatic quality assessment for speech, music, and sound.
Reimplementation of Bandit for "Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support"
Repository for training models for music source separation.
Simple and fast HTTP framework for Mojo! 🔥🐝
This repository is an implementation of this article: https://arxiv.org/pdf/2107.03312.pdf
junkblocker / codesearch
Forked from google/codesearchFork of Google codesearch with more options
Lightning fast code searching made easy
Haptic input knob with software-defined endstops and virtual detents
Layout algorithms for visualizing directed acyclic graphs
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The text that includes words from two languages such as Hindi writt…
RAG based tool for indexing and searching PDF text data using OpenAI API and FAISS (Facebook AI Similarity Search) index, designed for rapid information retrieval and superior search accuracy.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
ReadingBank: A Benchmark Dataset for Reading Order Detection
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
🔊 Text-Prompted Generative Audio Model
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)