Stars
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
A lightweight LMM-based Document Parsing Model
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Text-audio foundation model from Boson AI
ACE-Step: A Step Towards Music Generation Foundation Model
DFloat11: Lossless LLM Compression for Efficient GPU Inference
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
OmniGen2: Exploration to Advanced Multimodal Generation.
Grundlagenskript fuer Tonmeisterstudenten (2000)
Fully Local Manus AI. No APIs, No $200 monthly bills. Enjoy an autonomous agent that thinks, browses the web, and code for the sole cost of electricity. 🔔 Official updates only via twitter @Martin9…
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Official codes of CCSRv2 and CCSRv1: Improving the Stability and Efficiency of Diffusion Models for Content Consistent Super-Resolution
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Machine Learning for Imbalanced Data, published by Packt
Instructional notebooks on music information retrieval.
Understanding Deep Learning - Simon J.D. Prince
[ECCV 2024] codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior