Skip to content
View ColeDrain's full-sized avatar

Block or report ColeDrain

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Contexts Optical Compression

Python 19,709 1,389 Updated Oct 25, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 35,381 3,888 Updated Apr 19, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 62,825 9,258 Updated Nov 6, 2025

Faster Whisper transcription with CTranslate2

Python 18,890 1,569 Updated Oct 31, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 18,606 1,972 Updated Oct 21, 2025

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python 58,773 9,372 Updated Sep 23, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,594 562 Updated Oct 31, 2025

FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…

94,653 25,535 Updated Nov 1, 2025

Tesseract Open Source OCR Engine (main repository)

C++ 70,736 10,360 Updated Oct 13, 2025

SkyReels-V2: Infinite-length Film Generative model

Python 4,897 693 Updated Aug 11, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 4,225 485 Updated Apr 15, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,089 215 Updated May 19, 2025

WhatsApp MCP server

Go 5,044 776 Updated Jul 13, 2025

Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.

HTML 2,490 109 Updated Nov 6, 2025

A library that provides an embedded python distribution to be usable from inside golang

Go 320 30 Updated Jan 2, 2025

Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.

Python 3,683 254 Updated Nov 4, 2025

API service for docling document conversion

Python 37 9 Updated Feb 20, 2025

Running Docling as an API service

Python 907 202 Updated Oct 31, 2025

Get your documents ready for gen AI

Python 43,130 3,088 Updated Nov 6, 2025

scipts for working with open.bible data

Shell 25 14 Updated Jan 24, 2022

Your one-stop solution for voice dataset creation

Python 127 24 Updated Dec 10, 2023

ChatShell is a productivity tool for the command-line, powered by OpenAI's GPT-3 language model. It helps users find shell commands quickly and easily, reducing the need to search online and improv…

Go 2 Updated Mar 25, 2023

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 1,962 244 Updated Oct 16, 2025

Free MLOps course from DataTalks.Club

Jupyter Notebook 13,586 2,720 Updated Oct 15, 2025

Runtime installer for Python applications

Rust 1,840 53 Updated Nov 1, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 18,838 1,286 Updated Oct 21, 2025

Rembg is a tool to remove images background

Python 20,951 2,160 Updated Oct 25, 2025

Source code for the X Recommendation Algorithm

Scala 67,704 12,618 Updated Sep 8, 2025

🎤⌨️ Acoustic keyboard eavesdropping

C++ 8,923 605 Updated Jan 15, 2023

AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.

Python 33 10 Updated Mar 10, 2025
Next