-
ElevenLabs
- Trondheim, Norway
- @iver56
Starred repositories
LongCat Audio Tokenizer and Detokenizer
Retrieval-Augmented MOS Prediction with Prior Knowledge Integration
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
IIR Hilbert filter: short, dependency-free, header-only C++
Cloud AI live transcription and translation service plugin
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A general purpose task-agnostic speech augmentation policy
[NeurIPS 2024] VFIMamba: Video Frame Interpolation with State Space Models
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency
Banquet: A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems
Audio decoding libraries for C/C++, each in a single source file.
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Optimized primitives for collective multi-GPU communication
A feature-rich command-line audio/video downloader