Releases: google/langextract
v1.0.9
What's New
Features
- Prompt alignment validation for few-shot examples (#215)
- Validates that example extractions exist in their source text
- Three modes: OFF, WARNING (default), ERROR
- New parameters:
prompt_validation_level
andprompt_validation_strict
- Vertex AI authentication support for Gemini provider (#60)
- llama-cpp-python community provider added (#202)
Improvements
- Changed
debug=False
as default inextract()
for cleaner output - Fixed router typings for provider plugins (#190)
- Allow T-prefixed TypeVars in pylint (#194)
Full Changelog: v1.0.8...v1.0.9
v1.0.8
What's Changed
Features
- Ollama timeout improvements (#154)
- Increased default timeout from 30s to 120s
- Made timeout configurable via ModelConfig
- Fixed kwargs not being passed through
Documentation
- Improved visualization examples for Jupyter/Colab (#153)
- Added Romeo & Juliet Colab notebook
Full Changelog: v1.0.7...v1.0.8
v1.0.7
What's New
- Debug logging support when
debug=True
inlx.extract()
(#142) - GPT-5 model registration fixes (#143)
- Improved documentation for provider plugins and schema support
- Automated plugin generator script for external providers
- Base URL support for OpenAI-compatible endpoints (#138)
See the full changelog for details.
v1.0.6 - Custom Model Provider Plugins & Schema System Refactor
Major Features
Custom Model Provider Plugin Support
- New provider registry infrastructure for extending LangExtract with custom LLM providers
- Plugin discovery via entry points allows third-party packages to register providers
- Example implementation available at examples/custom_provider_plugin
Schema System Refactor
- Refactored schema system to support provider-specific schema implementations
- Providers can now define their own schema constraints and validation
- Better separation of concerns between core schema logic and provider implementations
Enhancements
- Ollama Provider: Added support for Hugging Face style model IDs (e.g.,
meta-llama/Llama-3.2-1B-Instruct
) - Extract API: Added
model
andconfig
parameters toextract()
for more flexible model configuration - Examples: Updated Ollama quickstart to demonstrate ModelConfig pattern with JSON mode
- Testing: Improved test infrastructure for provider registry and plugin system
Bug Fixes
- Fixed lazy loading for provider pattern registration
- Fixed unicode escaping in example generation
- Fixed test failures related to provider registry initialization
Installation
pip install langextract==1.0.6
Full Changelog: v1.0.5...v1.0.6
LangExtract v1.0.5
What's Changed
Bug Fixes
- Fix chunking bug when newlines fall at chunk boundaries (#88) - Resolves issue where content was incorrectly chunked when newline characters appeared at chunk boundaries
- Fix IPython import warnings and improve notebook detection (#86) - Eliminates import warnings in Jupyter notebooks and improves compatibility
New Features
- Add base_url parameter to OpenAILanguageModel (#51) - Enables using custom OpenAI-compatible endpoints for alternative LLM providers
Full Changelog: v1.0.4...v1.0.5
v1.0.4 - Ollama integration and improvements
What's Changed
- Added Ollama language model integration – Full support for local LLMs via Ollama
- Docker deployment support – Production-ready docker-compose setup with health checks
- Comprehensive examples – Quickstart script and detailed documentation in
examples/ollama/
- Fixed OllamaLanguageModel parameter – Changed from
model
tomodel_id
for consistency (#57) - Enhanced CI/CD – Added Ollama integration tests that run on every PR
- Improved documentation – Consistent API examples across all language models
Technical Details
- Supports all Ollama models (gemma2:2b, llama3.2, mistral, etc.)
- Secure setup with localhost-only binding by default
- Integration tests use lightweight models for faster CI runs
- Docker setup includes automatic model pulling and health checks
Usage Example
import langextract as lx
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
examples=examples,
language_model_type=lx.inference.OllamaLanguageModel,
model_id="gemma2:2b",
model_url="http://localhost:11434",
fence_output=False,
use_schema_constraints=False
)
Quick setup: Install Ollama from ollama.com, run ollama pull gemma2:2b
, then ollama serve
.
For detailed installation, Docker setup, and more examples, see examples/ollama/
.
Full Changelog: v1.0.3...v1.0.4
v1.0.3 - OpenAI language model support
v1.0.3 – OpenAI language model support
What's Changed
- Added OpenAI language model integration – Support for GPT-4o, GPT-4o-mini, and other OpenAI models
- Enhanced documentation – Added OpenAI usage examples and API key setup instructions to README
- Comprehensive test coverage – Added unit tests for OpenAI backend
Technical Details
- Uses modern OpenAI v1.x client API with parallel processing support
- Note: Schema constraints for OpenAI are not yet implemented (use
use_schema_constraints=False
)
Full Changelog: v1.0.2...v1.0.3
v1.0.2: Removes pylibmagic dependency
v1.0.2 – Slimmer install, Windows fix, OpenAI v1.x support
What’s Changed
- Removed
langfun
andpylibmagic
dependencies – lighter install; nolibmagic
needed on Windows - Fixed Windows-installation failure [#25]
- Restored compatibility with modern OpenAI SDK v1.x [#16]
- Updated README and Dockerfile to match the new, slimmer dependency set
Note
LangFunLanguageModel
has been removed.
If you still need LangFun support, please open a new issue so we can discuss re-adding it in a cross-platform way.
Full Changelog: v1.0.1...v1.0.2
v1.0.1: Fix libmagic dependency issue
What's Changed
- Fixed libmagic ImportError by adding pylibmagic dependency (#6)
- Added
[full]
install option for easier setup - Added Docker support with pre-installed libmagic
- Updated troubleshooting documentation
Bug Fixes
- Resolve "failed to find libmagic" error when importing langextract (#6)
Installation
# Standard install (now includes pylibmagic)
pip install langextract
# Full install (explicit all dependencies)
pip install langextract[full]
# Docker (libmagic pre-installed)
docker run --rm -e LANGEXTRACT_API_KEY="your-key" langextract python script.py
Full Changelog: v1.0.0...v1.0.1
LangExtract v1.0.0 - Structured Information Extraction
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Key Features
- Extract structured data from any text using few-shot examples
- Support for Gemini and Ollama models
- Interactive HTML visualizations with source highlighting
- Optimized for long documents with parallel processing and multiple extraction passes
- Precise source grounding - every extraction maps to its location in the original text
Installation
pip install langextract
See the documentation for full usage examples.