-
National Taiwan University
- Taiwan
-
06:41
(UTC +08:00) - https://kehan.lu
- @kehan_lu
Highlights
- Pro
Starred repositories
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
HighRateMOS is the first non-intrusive MOS prediction model that explicitly models sampling rates, achieving first place in five out of eight metrics in AudioMOS Challenge 2025 Track3.
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
A fancy self-hosted monitoring tool
A list of publically available audio data that anyone can download for ASR or other speech activities
An open-source AI agent that brings the power of Gemini directly into your terminal.
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Leaderboard and code for "Speech-IFEval", Interspeech 2025
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
A TTS model capable of generating ultra-realistic dialogue in one pass.
A collaborative note taking, wiki and documentation platform that scales. Built with Django and React.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Train transformer language models with reinforcement learning.
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Voice gender classifier using ECAPA-TDNN