entn-at

Ewald Enzinger entn-at

Ph.D. EE (UNSW Sydney). ML, speaker recognition, speech recognition, speech synthesis, forensic voice comparison

118 followers · 331 following

Achievements

SpeechJudge Public
Forked from AmphionTeam/SpeechJudge

[Under development] SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://speechjudge.github.io/)

Python Updated Nov 14, 2025
DiFlow-TTS Public
Forked from ishine/DiFlow-TTS

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-to-Speech

Python Updated Nov 8, 2025
calm Public
Forked from shaochenze/calm

Official implementation of "Continuous Autoregressive Language Models"

Python MIT License Updated Nov 8, 2025
flow_grpo Public
Forked from yifan123/flow_grpo

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python MIT License Updated Nov 8, 2025
T5Voice Public
Forked from MuyangDu/T5Voice

T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech synthesis with zero-shot capabilities.

Python Apache License 2.0 Updated Nov 7, 2025
speaker_disentangled_hubert Public
Forked from ryota-komatsu/speaker_disentangled_hubert

Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"

Python MIT License Updated Nov 5, 2025
ca-subtitle Public
Forked from JaesungHuh/ca-subtitle

Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"

Python Apache License 2.0 Updated Nov 3, 2025
UniVoice Public
Forked from gwh22/UniVoice

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Python Updated Oct 30, 2025
unified-audio Public
Forked from alibaba/unified-audio

An Open-Source Project to Unify Audio Processing and Generation

HTML Apache License 2.0 Updated Oct 29, 2025
SoulX-Podcast Public
Forked from Soul-AILab/SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python Apache License 2.0 Updated Oct 28, 2025
unmute Public
Forked from kyutai-labs/unmute

Make text LLMs listen and speak

Python MIT License Updated Oct 25, 2025
moshi Public
Forked from kyutai-labs/moshi

Python Apache License 2.0 Updated Oct 25, 2025
ARC-Encoder Public
Forked from kyutai-labs/ARC-Encoder

Python Apache License 2.0 Updated Oct 24, 2025
LSCodec-Inference Public
Forked from X-LANCE/LSCodec-Inference

Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"

Python MIT License Updated Oct 23, 2025
cmvdr Public
Forked from Screeen/cmvdr

Official repo for "MVDR Beamforming for Cyclostationary Processes".

Python MIT License Updated Oct 22, 2025
whistle Public
Forked from hon9kon9ize/whistle

Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

Python Updated Oct 20, 2025
StreamVoiceAnon Public
Forked from Plachtaa/StreamVoiceAnon

Real-time streaming voice anonymization & voice conversion

Python Apache License 2.0 Updated Oct 20, 2025
transcribe-rs Public
Forked from cjpais/transcribe-rs

a simple transcription library for rust

Rust MIT License Updated Oct 17, 2025
Conan Public
Forked from User-tian/Conan

Official Implementation of "Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion"

Python MIT License Updated Oct 16, 2025
makepad Public
Forked from makepad/makepad

Makepad is a creative software development platform for Rust that compiles to wasm/webGL, osx/metal, windows/dx11 linux/opengl

WebAssembly MIT License Updated Oct 13, 2025
Ming-UniAudio Public
Forked from inclusionAI/Ming-UniAudio

Python MIT License Updated Oct 12, 2025
ggwave Public
Forked from ggerganov/ggwave

Tiny data-over-sound library

C++ MIT License Updated Oct 11, 2025
rustfst Public
Forked from garvys-org/rustfst

Rust library for Weighted Finite States Transducers as decribed by Mohri and Allauzen

Rust Other Updated Oct 11, 2025
speech_resynth Public
Forked from ryota-komatsu/speech_resynth

Speech Resynthesis using Conditional Flow Matching and HuBERT Units

Python MIT License Updated Oct 11, 2025
tract Public
Forked from sonos/tract

Tiny, no-nonsense, self contained, Tensorflow and ONNX inference

Rust Other Updated Oct 11, 2025
Amphion Public
Forked from open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python MIT License Updated Oct 11, 2025
gemma3-object-detection Public
Forked from ariG23498/gemma3-object-detection

Fine tune Gemma 3 on an object detection task

Python Updated Oct 11, 2025
ZipVoice Public
Forked from k2-fsa/ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python Apache License 2.0 Updated Oct 11, 2025
rten Public
Forked from robertknight/rten

ONNX neural network inference engine

Rust Updated Oct 11, 2025
RNDVoC Public
Forked from Andong-Li-speech/RNDVoC

This is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.

Python MIT License Updated Oct 11, 2025

Ewald Enzinger entn-at

Achievements

Achievements

SpeechJudge Public

Uh oh!

DiFlow-TTS Public

Uh oh!

calm Public

Uh oh!

flow_grpo Public

Uh oh!

T5Voice Public

Uh oh!

speaker_disentangled_hubert Public

Uh oh!

ca-subtitle Public

Uh oh!

UniVoice Public

Uh oh!

unified-audio Public

Uh oh!

SoulX-Podcast Public

Uh oh!

unmute Public

Uh oh!

moshi Public

Uh oh!

ARC-Encoder Public

Uh oh!

LSCodec-Inference Public

Uh oh!

cmvdr Public

Uh oh!

whistle Public

Uh oh!

StreamVoiceAnon Public

Uh oh!

transcribe-rs Public

Uh oh!

Conan Public

Uh oh!

makepad Public

Uh oh!

Ming-UniAudio Public

Uh oh!

ggwave Public

Uh oh!

rustfst Public

Uh oh!

speech_resynth Public

Uh oh!

tract Public

Uh oh!

Amphion Public

Uh oh!

gemma3-object-detection Public

Uh oh!

ZipVoice Public

Uh oh!

rten Public

Uh oh!

RNDVoC Public

Uh oh!