Releases · google/langextract

What's New

Features

Prompt alignment validation for few-shot examples (#215)
- Validates that example extractions exist in their source text
- Three modes: OFF, WARNING (default), ERROR
- New parameters: prompt_validation_level and prompt_validation_strict
Vertex AI authentication support for Gemini provider (#60)
llama-cpp-python community provider added (#202)

Improvements

Changed debug=False as default in extract() for cleaner output
Fixed router typings for provider plugins (#190)
Allow T-prefixed TypeVars in pylint (#194)

Full Changelog: v1.0.8...v1.0.9

What's Changed

Features

Ollama timeout improvements (#154)
- Increased default timeout from 30s to 120s
- Made timeout configurable via ModelConfig
- Fixed kwargs not being passed through

Documentation

Improved visualization examples for Jupyter/Colab (#153)
Added Romeo & Juliet Colab notebook

Full Changelog: v1.0.7...v1.0.8

What's New

Debug logging support when debug=True in lx.extract() (#142)
GPT-5 model registration fixes (#143)
Improved documentation for provider plugins and schema support
Automated plugin generator script for external providers
Base URL support for OpenAI-compatible endpoints (#138)

See the full changelog for details.

Major Features

Custom Model Provider Plugin Support

New provider registry infrastructure for extending LangExtract with custom LLM providers
Plugin discovery via entry points allows third-party packages to register providers
Example implementation available at examples/custom_provider_plugin

Schema System Refactor

Refactored schema system to support provider-specific schema implementations
Providers can now define their own schema constraints and validation
Better separation of concerns between core schema logic and provider implementations

Enhancements

Ollama Provider: Added support for Hugging Face style model IDs (e.g., meta-llama/Llama-3.2-1B-Instruct)
Extract API: Added model and config parameters to extract() for more flexible model configuration
Examples: Updated Ollama quickstart to demonstrate ModelConfig pattern with JSON mode
Testing: Improved test infrastructure for provider registry and plugin system

Bug Fixes

Fixed lazy loading for provider pattern registration
Fixed unicode escaping in example generation
Fixed test failures related to provider registry initialization

Installation

pip install langextract==1.0.6

Full Changelog: v1.0.5...v1.0.6

What's Changed

Bug Fixes

Fix chunking bug when newlines fall at chunk boundaries (#88) - Resolves issue where content was incorrectly chunked when newline characters appeared at chunk boundaries
Fix IPython import warnings and improve notebook detection (#86) - Eliminates import warnings in Jupyter notebooks and improves compatibility

New Features

Add base_url parameter to OpenAILanguageModel (#51) - Enables using custom OpenAI-compatible endpoints for alternative LLM providers

Full Changelog: v1.0.4...v1.0.5

What's Changed

Added Ollama language model integration – Full support for local LLMs via Ollama
Docker deployment support – Production-ready docker-compose setup with health checks
Comprehensive examples – Quickstart script and detailed documentation in examples/ollama/
Fixed OllamaLanguageModel parameter – Changed from model to model_id for consistency (#57)
Enhanced CI/CD – Added Ollama integration tests that run on every PR
Improved documentation – Consistent API examples across all language models

Technical Details

Supports all Ollama models (gemma2:2b, llama3.2, mistral, etc.)
Secure setup with localhost-only binding by default
Integration tests use lightweight models for faster CI runs
Docker setup includes automatic model pulling and health checks

Usage Example

import langextract as lx

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    language_model_type=lx.inference.OllamaLanguageModel,
    model_id="gemma2:2b",
    model_url="http://localhost:11434",
    fence_output=False,
    use_schema_constraints=False
)

Quick setup: Install Ollama from ollama.com, run ollama pull gemma2:2b, then ollama serve.

For detailed installation, Docker setup, and more examples, see examples/ollama/.

Full Changelog: v1.0.3...v1.0.4

v1.0.3 – OpenAI language model support

What's Changed

Added OpenAI language model integration – Support for GPT-4o, GPT-4o-mini, and other OpenAI models
Enhanced documentation – Added OpenAI usage examples and API key setup instructions to README
Comprehensive test coverage – Added unit tests for OpenAI backend

Technical Details

Uses modern OpenAI v1.x client API with parallel processing support
Note: Schema constraints for OpenAI are not yet implemented (use use_schema_constraints=False)

Full Changelog: v1.0.2...v1.0.3

v1.0.2 – Slimmer install, Windows fix, OpenAI v1.x support

What’s Changed

Removed langfun and pylibmagic dependencies – lighter install; no libmagic needed on Windows
Fixed Windows-installation failure [#25]
Restored compatibility with modern OpenAI SDK v1.x [#16]
Updated README and Dockerfile to match the new, slimmer dependency set

Note

LangFunLanguageModel has been removed.
If you still need LangFun support, please open a new issue so we can discuss re-adding it in a cross-platform way.

Full Changelog: v1.0.1...v1.0.2

What's Changed

Fixed libmagic ImportError by adding pylibmagic dependency (#6)
Added [full] install option for easier setup
Added Docker support with pre-installed libmagic
Updated troubleshooting documentation

Bug Fixes

Resolve "failed to find libmagic" error when importing langextract (#6)

Installation

# Standard install (now includes pylibmagic)
pip install langextract

# Full install (explicit all dependencies)
pip install langextract[full]

# Docker (libmagic pre-installed)
docker run --rm -e LANGEXTRACT_API_KEY="your-key" langextract python script.py

Full Changelog: v1.0.0...v1.0.1

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

Key Features

Extract structured data from any text using few-shot examples
Support for Gemini and Ollama models
Interactive HTML visualizations with source highlighting
Optimized for long documents with parallel processing and multiple extraction passes
Precise source grounding - every extraction maps to its location in the original text

Installation

pip install langextract

See the documentation for full usage examples.

Releases: google/langextract

v1.0.9

What's New

Features

Improvements

Uh oh!

v1.0.8

What's Changed

Features

Documentation

Uh oh!

v1.0.7

What's New

Uh oh!

v1.0.6 - Custom Model Provider Plugins & Schema System Refactor

Major Features

Custom Model Provider Plugin Support

Schema System Refactor

Enhancements

Bug Fixes

Installation

Uh oh!

LangExtract v1.0.5

What's Changed

Bug Fixes

New Features

Uh oh!

v1.0.4 - Ollama integration and improvements

What's Changed

Technical Details

Usage Example

Uh oh!

v1.0.3 - OpenAI language model support

v1.0.3 – OpenAI language model support

What's Changed

Technical Details

Uh oh!

v1.0.2: Removes pylibmagic dependency

v1.0.2 – Slimmer install, Windows fix, OpenAI v1.x support

What’s Changed

Note

Uh oh!

v1.0.1: Fix libmagic dependency issue

What's Changed

Bug Fixes

Installation

Uh oh!

LangExtract v1.0.0 - Structured Information Extraction

Key Features

Installation

Uh oh!