Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Native Node.js addon for capturing per-app audio on macOS using ScreenCaptureKit. Real-time audio streaming with event-based API
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Official codebase for the paper "Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations"
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Krea Realtime 14B. An open-source realtime AI video model.
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Text to speech alignment using CTC forced alignment
realtime message based on socketio and redis
Realtime Audio SDK for the Web — audio capture, echo cancellation (AEC), voice activity detection (VAD), and real-time encoding (Opus/PCM).
The official implementation of GTCRN, an ultra-lightweight SE model.
LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme-level pronunciation and prosody while preserving other lang…
Wan: Open and Advanced Large-Scale Video Generative Models
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
A Unified Framework for Expressive Speech Synthesis with Voice Cloning
Text-audio foundation model from Boson AI
一个面向中文文本纠错任务的综合平台,集学术研究、模型训练、模型评测和推理部署于一体,覆盖拼写纠错与语法纠错两个核心方向。
The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.
A codebase for data crawling and preprocessing for TTS and ASR systems training.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
VoiceBench: Benchmarking LLM-Based Voice Assistants