Stars
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
The open-source, cross-platform API client for GraphQL, REST, WebSockets, SSE and gRPC. With Cloud, Local and Git storage.
🧌 Parsing structured information from OCR outputs
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Rapid fuzzy string matching in Python using various string metrics
CodeVisualizer is a powerful VS Code extension that provides two main visualization capabilities: function-level flowcharts for understanding code control flow, and codebase-level dependency graphs…
Toonify: Compact data format reducing LLM token usage by 30-60%
Tesseract Open Source OCR Engine (main repository)
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
🎨 Ready-to-use DeepSeek-OCR Web UI | Modern Interface | 7 Recognition Modes | Batch Processing | Real-time Logging | Fully Responsive
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
Port of Google's language-detection library to Python.
An interpretable regression model in Python with Random-Forest-level accuracy
ContextGem: Effortless LLM extraction from documents
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
A developer-friendly API for converting numerous document formats into PDF files, and more!
Deploy any AI model, agent, database, RAG, and pipeline locally or remotely in minutes
Free and Open Source, Distributed, RESTful Search Engine