A FastAPI-based search engine for Ossetian dictionaries with advanced transliteration support and typo tolerance.
- Unified Search Endpoint: Single
/search-htmlendpoint that accepts both GET and POST requests - Multilingual Support: Search in English, Russian, and Ossetian with typo tolerance
- Advanced Transliteration: Academic-grade transliteration between Latin and Cyrillic scripts for Ossetian terms
- Typo Tolerance: Multi-layered approach to handle spelling variations and common errors
- Contextual Results: Three levels of result context to fit different research needs
- Source Filtering: Ability to search within specific dictionaries
- Comprehensive Dictionary Collection: Access to 16 dictionaries covering historical, etymological, and specialized content
- OpenAPI Documentation: Full documentation with usage examples and character mappings
The API provides a simplified interface for search queries:
GET /search-html/тæрхъус
Parameters:
query: The search term as part of the path (required)limit: Maximum number of results (optional, default: 10, max: 50)transliteration: Enable/disable transliteration (optional, default: true)context_size: Amount of context to return (optional, default: "default", options: "default", "expanded", "full")source: Filter results by source dictionary (optional)
Examples with optional parameters:
GET /search-html/тæрхъус?limit=5
GET /search-html/тæрхъус?transliteration=false
GET /search-html/тæрхъус?context_size=expanded
GET /search-html/тæрхъус?source=Абаев
The API also supports query parameters for backward compatibility:
GET /search-html?query=тæрхъус
POST /search-html
Content-Type: application/json
{
"query": "тæрхъус"
}
The POST method is recommended for complex queries and non-ASCII characters.
Examples with optional parameters:
{
"query": "тæрхъус",
"limit": 5
}{
"query": "тæрхъус",
"transliteration": false
}{
"query": "тæрхъус",
"context_size": "expanded",
"source": "Толковый словарь"
}The API provides a health check endpoint to verify the operational status of both the API and the search engine:
GET /health
Response:
{
"status": "healthy",
"message": "API and search engine are operational"
}If the search engine is unavailable, the endpoint returns a 503 Service Unavailable status code.
The API implements a sophisticated transliteration system between Latin and Cyrillic scripts for Ossetian terms. This means you can search for terms like "тæрхъус" or "tærqūs" and find relevant results regardless of which script was used in the original dictionary.
The system supports academic-grade transliteration conventions:
- Both æ and ä forms are recognized (æ is transcribed as ä in some scholarly works)
- Glottal stops marked with apostrophes (k', p', t', c')
- Specialized notation for labialized velar consonants (хъуыд/kẜyd, гъуыр/gẜyr, къуым/k'ẜym)
- Support for specialized characters like ә for Cyrillic у
- Indo-European palatalized sounds represented as ḱ, ǵ
The system includes multi-layered typo tolerance:
-
Character Variant Generation: Automatically creates common spelling variants
- æ/ä → a, e
- ū → u
- š → sh
- And more...
-
Bidirectional Script Conversion: Searches in either script find matches in both
-
Special Case Handling: Words with irregular transliterations have explicit mappings
-
Meilisearch Engine: Built-in tolerance for character transpositions, missing/extra letters
Transliteration is enabled by default but can be disabled by setting the transliteration parameter to false.
The search covers 16 dictionaries from the following categories:
- ТОМ 1 (A-K) - 1958
- ТОМ 2 (L-R) - 1973
- ТОМ 3 (S-T) - 1979
- ТОМ 4 (U-Z) - 1989
- Толковый словарь осетинского языка, Том 1 (ред. Габараев Н.Я.) - 2007
- Толковый словарь осетинского языка, Том 2 (ред. Габараев Н.Я.) - 2010
- Осетинские пословицы и поговорки - 1976, 1977
- Осетинские пословицы и поговорки (Айларов И.Х.) - 2006
- Осетинские дигорские народные изречения - 2011
- Названия растений в осетинском языке (Техов Ф.Д.) - 1979
- Лексика народной медицины осетин (Дзабиев З.Т.) - 1981
- Народная медицинская терминология осетин (Джабиев З.П.) - 2018
- Краткий словарь литературных терминов - 1971
- Происхождение фамилий Дигорского ущелья (Гецати А.А.) - 1999
- Осетинские фамилии (Гаглоева З.Д.) - 2017
Full API documentation is available in OpenAPI format. You can view the interactive documentation by:
- Starting the server
- Navigating to http://htmldicts.setia.dev:8100/ in your browser
Alternatively, view the OpenAPI specification directly at /openapi.yaml
- Clone this repository
- Install dependencies:
pip install -r requirements.txt - Start the Meilisearch server:
docker run -p 7700:7700 getmeili/meilisearch - Run the API server:
uvicorn app.api.api:app --host 0.0.0.0 --port 8100 --reload - Open http://htmldicts.setia.dev:8100/ in your browser to access the API documentation
This section provides comprehensive instructions for deploying and maintaining the dictionary server in production environments.
The recommended way to deploy this service is using Docker Compose, which manages both the application and Meilisearch in a containerized environment.
- Docker and Docker Compose installed on your system
- The repository cloned to your server
-
Start the services:
docker-compose up -d
This command starts both the application server and Meilisearch in the background.
-
Index the dictionaries (first-time setup or reindexing):
# Make the script executable chmod +x index-with-docker.sh # Run the indexing script ./index-with-docker.sh
-
Access the API:
- API endpoints are available at
http://localhost:8100/ - Meilisearch admin interface is available at
http://localhost:7701/
- API endpoints are available at
-
View logs:
docker-compose logs -f app # Application logs docker-compose logs -f meilisearch # Meilisearch logs
-
Stop the services:
docker-compose down
-
Restart the services:
docker-compose restart
-
Rebuild and restart (after code changes):
docker-compose up -d --build
Alternatively, you can run the components manually:
The server can be run using one of the following methods:
python run_server.pyThis script starts the server at http://0.0.0.0:8000 by default.
uvicorn app.api.api:app --host 0.0.0.0 --port 8100For production deployments, consider:
- Adding
--workersparameter to specify the number of worker processes (e.g.,--workers 4) - Removing the
--reloadflag used during development - Setting up a process manager like Supervisor or systemd to ensure the server restarts automatically
To index dictionaries for the first time:
-
Ensure Meilisearch is running on the default port (7700):
docker run -d --name meilisearch -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch -
Run the indexer script:
python run_indexer.py
This will scan the Dicts/ directory for HTML dictionary files and index them into Meilisearch. The process may take several minutes depending on the number and size of dictionaries.
To reindex dictionaries (e.g., after adding new dictionaries or updating existing ones):
-
Verify that Meilisearch is running:
curl http://localhost:7700/health
-
Run the indexer script:
python run_indexer.py
The indexer will automatically handle the process of updating the search index with any new or modified content.
For production deployments, consider the following:
-
Persistent Meilisearch data: Use volumes to persist Meilisearch data across container restarts:
docker run -d --name meilisearch -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch -
API Key Security: Configure Meilisearch with API keys for production use:
docker run -d --name meilisearch -p 7700:7700 -v $(pwd)/meili_data:/meili_data -e MEILI_MASTER_KEY=YOUR_MASTER_KEY getmeili/meilisearch -
HTTPS: Use a reverse proxy like Nginx or Traefik to handle HTTPS termination and secure your API.
-
Logging: Configure logging for both the FastAPI application and Meilisearch to ensure you can troubleshoot issues.
-
Monitoring: Set up health checks to monitor the status of both the API and Meilisearch.
A sample Python client is provided in client_example.py to demonstrate how to use the API. Run it with:
python client_example.py "тæрхъус"Additional options:
--method get|post|both: Choose the HTTP method (default: both)--limit N: Set maximum number of results (default: 5)--no-transliteration: Disable transliteration--health: Check API health status
The API provides three levels of context for search results:
- default: Shows only the entry definition with no additional context
- expanded: Includes approximately 2 paragraphs before and after the definition, providing additional context
- full: Returns approximately 5 paragraphs before and after the definition, showing a more complete section of the dictionary
This feature is particularly useful for scholarly research where understanding the surrounding context of a dictionary entry is important.
You can filter results to come from a specific dictionary by using the source parameter. The value is matched against the dictionary filename, so you can use partial names like "Абаев" to match any of Abayev's dictionaries or the full filename to target a specific dictionary.
Example:
GET /search-html/хуым?source=ИСТОРИКО-ЭТИМОЛОГИЧЕСКИЙ СЛОВАРЬ ОСЕТИНСКОГО ЯЗЫКА - ТОМ 4
The system includes multi-layered typo tolerance:
-
Character Variant Generation: Automatically creates common spelling variants
- æ/ä → a, e
- ū → u
- š → sh
- And more...
-
Bidirectional Script Conversion: Searches in either script find matches in both
-
Special Case Handling: Words with irregular transliterations have explicit mappings
-
Meilisearch Engine: Built-in tolerance for character transpositions, missing/extra letters
Transliteration is enabled by default but can be disabled by setting the transliteration parameter to false.