Stars
A fully open-source humanoid arm for physical AI research and deployment in contact-rich environments.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
An open source bipedal robot control framework, based on non-linear MPC and WBC, tailered for EC-hunter80-v01 bipedal robot.
[RSS 2025 Best Systems Paper Finalist] 💐Official implementation of "Learning Humanoid Standing-up Control across Diverse Postures"
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)
Manifold is a platform for enabling workflow automation using AI assistants.
No fortress, purely open ground. OpenManus is Coming.
A nearly-live implementation of OpenAI's Whisper.
Real time transcription with OpenAI Whisper.
Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embeddings recursively. This helps us understand user behaviour on…
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
tikikun / f5-tts-mlx-quantized
Forked from lucasnewman/f5-tts-mlxImplementation of F5-TTS in MLX
Towards Human-Friendly, Fast Learning and Adaptable Agent Communities
This is a single-speaker neural text-to-speech (TTS) system capable of training in a end-to-end fashion. It is inspired by the Tacotron archicture and able to train based on unaligned text-audio pa…
This repository contains the SpeechBrain Benchmarks
An Open Source text-to-speech system built by inverting Whisper.
Music repair method to convert lossy MP3 compressed music to lossless music.
Thesis Latex Template for Nanyang Technological University (NTU)
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction