-
Beatoven.ai
- United Kingdom
- https://jake-drysdale.github.io/blog/
- @jakedrysdale6
Stars
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
MemVerse: Multimodal Memory for Lifelong Learning Agents
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Repository for the paper "Combining audio control and style transfer using latent diffusion", accepted at ISMIR 2024
BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features
A python framework for symbolic music generation, evaluation and analysis
State of the Art of Music Generation with Deep Learning and AI
⚡ Finetune Wa2vec 2.0 For Speech Recognition
A repo that builds text to music datasets from scratch, used in MuseContorlLite [ICML2025]
Code for the paper “Automatic Music Sample Identification with Multi-Track Contrastive Learning”.
MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners [ICML 2025]
Nodes for image juxtaposition for Flux in ComfyUI
"Fx-Encoder++: Extracting Instrument-wise Audio Effect Representations from Mixtures"
Official repo of On Exact Inversion of DPM-Solvers by Hong et al, in CVPR 2024.
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
A deep learning project for automated chorus detection in songs, featuring a command-line interface (CLI) tool that allows users to input a YouTube link and utilize a pre-trained CRNN model to dete…
AI tool for full-song music production within REAPER digital audio workstation.
Digital Catalog of Florence Price's Songs with Metadata
[NeurIPS 2023] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection
Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.
[NeurIPS 2025 D&B] Open-source Multi-agent Poster Generation from Papers
ACE-Step: A Step Towards Music Generation Foundation Model
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
PixelHacker: Image Inpainting with Structural and Semantic Consistency
[TMLR 2025🔥] A survey for the autoregressive models in vision.