- Portland, Oregon
- https://entn.at/
- @[email protected]
- @entn_at
- @entn-at.bsky.social
-
SpeechJudge Public
Forked from AmphionTeam/SpeechJudge[Under development] SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://speechjudge.github.io/)
Python UpdatedNov 14, 2025 -
DiFlow-TTS Public
Forked from ishine/DiFlow-TTSDiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-to-Speech
Python UpdatedNov 8, 2025 -
calm Public
Forked from shaochenze/calmOfficial implementation of "Continuous Autoregressive Language Models"
Python MIT License UpdatedNov 8, 2025 -
flow_grpo Public
Forked from yifan123/flow_grpoAn official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Python MIT License UpdatedNov 8, 2025 -
T5Voice Public
Forked from MuyangDu/T5VoiceT5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech synthesis with zero-shot capabilities.
Python Apache License 2.0 UpdatedNov 7, 2025 -
speaker_disentangled_hubert Public
Forked from ryota-komatsu/speaker_disentangled_hubertOfficial repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
Python MIT License UpdatedNov 5, 2025 -
ca-subtitle Public
Forked from JaesungHuh/ca-subtitleImplementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"
Python Apache License 2.0 UpdatedNov 3, 2025 -
UniVoice Public
Forked from gwh22/UniVoiceUniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
Python UpdatedOct 30, 2025 -
unified-audio Public
Forked from alibaba/unified-audioAn Open-Source Project to Unify Audio Processing and Generation
HTML Apache License 2.0 UpdatedOct 29, 2025 -
SoulX-Podcast Public
Forked from Soul-AILab/SoulX-PodcastSoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Python Apache License 2.0 UpdatedOct 28, 2025 -
unmute Public
Forked from kyutai-labs/unmuteMake text LLMs listen and speak
Python MIT License UpdatedOct 25, 2025 -
-
ARC-Encoder Public
Forked from kyutai-labs/ARC-EncoderPython Apache License 2.0 UpdatedOct 24, 2025 -
LSCodec-Inference Public
Forked from X-LANCE/LSCodec-InferenceInference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"
Python MIT License UpdatedOct 23, 2025 -
cmvdr Public
Forked from Screeen/cmvdrOfficial repo for "MVDR Beamforming for Cyclostationary Processes".
Python MIT License UpdatedOct 22, 2025 -
whistle Public
Forked from hon9kon9ize/whistleText-Only Domain Adaptation for Pretrained Speech Recognition Transformers
Python UpdatedOct 20, 2025 -
StreamVoiceAnon Public
Forked from Plachtaa/StreamVoiceAnonReal-time streaming voice anonymization & voice conversion
Python Apache License 2.0 UpdatedOct 20, 2025 -
transcribe-rs Public
Forked from cjpais/transcribe-rsa simple transcription library for rust
Rust MIT License UpdatedOct 17, 2025 -
Conan Public
Forked from User-tian/ConanOfficial Implementation of "Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion"
Python MIT License UpdatedOct 16, 2025 -
makepad Public
Forked from makepad/makepadMakepad is a creative software development platform for Rust that compiles to wasm/webGL, osx/metal, windows/dx11 linux/opengl
WebAssembly MIT License UpdatedOct 13, 2025 -
-
ggwave Public
Forked from ggerganov/ggwaveTiny data-over-sound library
C++ MIT License UpdatedOct 11, 2025 -
rustfst Public
Forked from garvys-org/rustfstRust library for Weighted Finite States Transducers as decribed by Mohri and Allauzen
Rust Other UpdatedOct 11, 2025 -
speech_resynth Public
Forked from ryota-komatsu/speech_resynthSpeech Resynthesis using Conditional Flow Matching and HuBERT Units
Python MIT License UpdatedOct 11, 2025 -
tract Public
Forked from sonos/tractTiny, no-nonsense, self contained, Tensorflow and ONNX inference
Rust Other UpdatedOct 11, 2025 -
Amphion Public
Forked from open-mmlab/AmphionAmphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Python MIT License UpdatedOct 11, 2025 -
gemma3-object-detection Public
Forked from ariG23498/gemma3-object-detectionFine tune Gemma 3 on an object detection task
Python UpdatedOct 11, 2025 -
ZipVoice Public
Forked from k2-fsa/ZipVoiceFast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Python Apache License 2.0 UpdatedOct 11, 2025 -
rten Public
Forked from robertknight/rtenONNX neural network inference engine
Rust UpdatedOct 11, 2025 -
RNDVoC Public
Forked from Andong-Li-speech/RNDVoCThis is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.
Python MIT License UpdatedOct 11, 2025