-
Waseda univ.
- Tokyo
-
16:47
(UTC +09:00) - https://yutoab.github.io/portfolio/
- https://www.docswell.com/user/yuAbe
- https://qiita.com/yuAbe
Highlights
- Pro
Lists (9)
Sort Name ascending (A-Z)
Stars
Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
🦜🔗 The platform for reliable agents.
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Voice Activity Projection Models: Self-supervised learning of Turn-taking Events
A real-time and light-weight software for generation of non-linguistic behaviors (turn-taking, backchannel, and head-nodding) in conversational AIs
Library for fast text representation and classification.
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
A Tree Search Library with Flexible API for LLM Inference-Time Scaling
Speech Resynthesis and Language Modeling
JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
Dialogue Speech Corpus with Audio-visual Egocentric Information, "So, what are you Speaking, Listening, and Watching?"
Code for evaluating Japanese pretrained models provided by NTT Ltd.
RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors' Own Personalities
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Tools for handling multimodal data in machine learning projects.
Unsupervised text tokenizer for Neural Network-based text generation.
The Remdis toolkit: Building advanced real-time multimodal dialogue systems with incremental processing and large language models
YosukeHiguchi / espnet
Forked from espnet/espnetEnd-to-End Speech Processing Toolkit
Tensors and Dynamic neural networks in Python with strong GPU acceleration