Skip to content
View abePclWaseda's full-sized avatar
💭
研究用
💭
研究用

Highlights

  • Pro

Block or report abePclWaseda

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"

Python 134 21 Updated Oct 27, 2025

Complex Function Calling Benchmark.

Python 160 23 Updated Jan 20, 2025
4 Updated Oct 7, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,552 216 Updated Dec 30, 2025

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 360 24 Updated May 27, 2025

🦜🔗 The platform for reliable agents.

Python 123,429 20,340 Updated Jan 4, 2026

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 112 4 Updated Sep 21, 2025

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 902 147 Updated Dec 1, 2024

Voice Activity Projection Models: Self-supervised learning of Turn-taking Events

Python 89 18 Updated May 29, 2024

A real-time and light-weight software for generation of non-linguistic behaviors (turn-taking, backchannel, and head-nodding) in conversational AIs

Python 74 9 Updated Dec 29, 2025

Library for fast text representation and classification.

HTML 26,467 4,815 Updated Mar 22, 2024

Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.

190 14 Updated Nov 10, 2024

A Tree Search Library with Flexible API for LLM Inference-Time Scaling

Python 510 65 Updated Dec 9, 2025

Speech Resynthesis and Language Modeling

Python 27 4 Updated Jun 11, 2025

JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit

Python 44 1 Updated May 26, 2025

Survey of audio language models

Jupyter Notebook 60 3 Updated Oct 25, 2025

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.

Python 1,125 167 Updated Dec 7, 2025

Dialogue Speech Corpus with Audio-visual Egocentric Information, "So, what are you Speaking, Listening, and Watching?"

Python 9 Updated Aug 13, 2024

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Python 245 22 Updated Jun 21, 2023

RealPersonaChat: A Realistic Persona Chat Corpus with Interlocutors' Own Personalities

63 Updated Mar 13, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 8,920 988 Updated Dec 13, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,420 2,079 Updated Oct 21, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,099 257 Updated Dec 15, 2025
Python 45 5 Updated Oct 14, 2025

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,559 1,318 Updated Jan 1, 2026

The Remdis toolkit: Building advanced real-time multimodal dialogue systems with incremental processing and large language models

Python 102 21 Updated Jun 4, 2025

End-to-End Speech Processing Toolkit

Python 15 3 Updated Jan 20, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,336 26,419 Updated Jan 5, 2026
Next