Stars
Streaming ASR and TTS based on FastAPI+ sherpa-onnx
Development repository for the Triton language and compiler
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
AI Speech Solutions for Tasks such as ASR, Vocal Extraction, Accompaniment Extraction, Audio Denoising, and Enhancement, Support models such as paraformer, sensevoice, fireredasr, zipformer, moonsh…
c# library for decoding paraformer, sensevoice Models,used in speech recognition (ASR)
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Added vLLM support to IndexTTS for faster inference.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
🚀 The fast, Pythonic way to build MCP servers and clients
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
MedEvalKit: A Unified Medical Evaluation Framework
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Official implementation for the paper "Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing". DIAL-MPC is a novel sampling-based MPC framework for legged …
Python interface for unitree sdk2