♕ RAG
Adding guardrails to large language models.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
DocLLM: A layout-aware generative language model for multimodal document understanding
Get clean data from tricky documents, powered by vision-language models ⚡
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, con…
LlamaIndex is the leading framework for building LLM-powered agents over your data.
A curated list of awesome synthetic data tools (open source and commercial).