A collection of converters for processing Islamic texts, with a focus on Fiqh cards and Tafsir data.
This project provides tools for converting Islamic jurisprudence (fiqh) documents from DOCX format to structured JSON using Claude Opus 4.1 AI for intelligent extraction.
- AI-Powered Extraction: Uses Claude Opus 4.1 to intelligently parse complex Arabic fiqh texts
- Structured JSON Output: Converts unstructured DOCX tables into well-organized JSON
- Handles Multiple Opinions: Correctly separates and attributes different scholarly positions
- Markdown Preview: View how documents are converted before processing
- Python 3.11 or higher
- UV package manager
- Clone the repository:
git clone [repository-url]
cd indexing- Install dependencies using UV:
uv sync- Set up environment variables:
Create a
.envfile in the project root with your Anthropic API key:
ANTHROPIC_API_KEY=your-api-key-hereThe fiqh card converter uses Claude Opus 4.1 to extract structured information from Islamic jurisprudence documents.
To see how a DOCX file will be converted to markdown (without calling the API):
PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli previewTo test the converter on the sample file:
PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli testThis will:
- Process
sample_input_data/fiqh_cards/sample.docx - Save output to
sample_output_data/fiqh_cards/sample_claude.json - Display a summary of extracted issues
To process all DOCX files in a directory:
PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli convert \
sample_input_data/fiqh_cards \
sample_output_data/fiqh_cardsThe Qul Tafsir converter processes tafsir (Quranic commentary) data and uploads it to Agentset for RAG (Retrieval Augmented Generation) applications.
To generate individual text files with metadata:
uv run python -m qul_tafsir.cli convert-agentset ibn-kathir --start-surah 1 --end-surah 115To upload all generated files to Agentset:
uv run python -m qul_tafsir.cli ingest-agentset ibn-kathirThis command:
- Uploads all section files to S3
- Creates a batch ingest job
- Monitors job status until completion
- Supports checkpoint/resume on failure
Environment variables required:
AGENTSET_API_TOKEN=your-agentset-token
AGENTSET_NAMESPACE_ID=your-namespace-idIf you accidentally upload duplicates, you can clean them up:
Preview duplicates (dry run):
uv run python -m qul_tafsir.cli deduplicate-agentsetActually delete duplicates (keeps oldest by default):
uv run python -m qul_tafsir.cli deduplicate-agentset --no-dry-runKeep newest instead of oldest:
uv run python -m qul_tafsir.cli deduplicate-agentset --no-dry-run --keep newestThe converter extracts the following information from each fiqh issue:
{
"issue_number": 1,
"question": "The main fiqh question",
"context": "Background and context of the disagreement",
"opinions": [
{
"position": "The opinion/ruling",
"scholars": ["Scholar1", "Scholar2"]
}
],
"disagreement_reason": "Why scholars disagree",
"evidence": {
"Evidence_1": "Quranic verses and hadiths",
"Evidence_2": "Additional proofs"
},
"preferred_opinion": "The strongest opinion",
"practical_impact": "Real-world implications",
"references": "Source references"
}indexing/
├── src/
│ ├── fiqh_card_converter/
│ │ ├── __init__.py
│ │ ├── claude_converter.py # Core converter using Claude AI
│ │ └── claude_cli.py # Command-line interface
│ └── qul_tafsir/ # Tafsir converter (to be ported to Goodmem)
├── sample_input_data/
│ └── fiqh_cards/ # Sample DOCX files
├── sample_output_data/
│ └── fiqh_cards/ # Generated JSON output
├── .env # Environment variables (create this)
├── pyproject.toml # Project configuration
└── README.md # This file
- Table Extraction: Reads tables from DOCX files containing fiqh issues
- Markdown Conversion: Converts table data to clean markdown format
- AI Processing: Sends markdown to Claude Opus 4.1 with a structured prompt
- JSON Generation: Claude extracts and returns structured JSON data
- Validation: Ensures all required fields are present in the output
This project requires an Anthropic API key with access to Claude Opus 4.1. The model used is:
- Model ID:
claude-opus-4-1-20250805 - Max tokens: 16384
- Temperature: 0 (for consistent output)
The converter is designed to be extensible. To modify the extraction schema, edit the FIQH_ISSUE_SCHEMA in claude_converter.py.
Always test changes using the preview command first to see the markdown conversion without consuming API credits:
PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli preview- Port qul_tafsir converter from Vectara to Goodmem
- Add batch processing with progress tracking
- Implement caching to avoid reprocessing files
[Add license information here]
[Add contact information here]