Skip to content
Change the repository type filter

All

    Repositories list

    • kreuzberg

      Public
      A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Go, and TypeScript/Node.js—or use via CLI, REST API, or MCP server.
      HTML
      1343.2k41Updated Dec 30, 2025Dec 30, 2025
    • High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg.dev team. Kreuzberg.dev is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR.
      HTML
      4145410Updated Dec 30, 2025Dec 30, 2025
    • .github

      Public
      Kreuzberg.dev is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR. Designed for RAG pipelines, batch workloads, and production deployments.
      0000Updated Dec 29, 2025Dec 29, 2025
    • A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
      Rust
      105100Updated Dec 25, 2025Dec 25, 2025