Stars
Kontra2B / librealsense
Forked from realsenseai/librealsenseIntel® RealSense™ SDK
PantoMatrix: Generating Face and Body Animation from Speech
Implement AES(Advanced Encryption Standard) Stystem in C program
Code and dataset for photorealistic Codec Avatars driven from audio
AudioLDM training, finetuning, evaluation and inference.
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …
A curated list of awesome vision and language resources for earth observation.
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Core Engine of Singing Voice Conversion & Singing Voice Clone
リアルタイムボイスチェンジャー Realtime Voice Changer
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
A python project that uses several standard/otherwise very common libraries to determine the key that a song (an .mp3) is in, i.e. F major or C# minor, with annotations and some examples.
A Machine Learning Approach of Emotional Model
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Foundational Models for State-of-the-Art Speech and Text Translation
Official PyTorch implementation of Contrastive Learning of Musical Representations
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
Pronounced as "musician", musicnn is a set of pre-trained deep convolutional neural networks for music audio tagging.