Ansari Indexing - Islamic Text Converters

A collection of converters for processing Islamic texts, with a focus on Fiqh cards and Tafsir data.

Overview

This project provides tools for converting Islamic jurisprudence (fiqh) documents from DOCX format to structured JSON using Claude Opus 4.1 AI for intelligent extraction.

Features

Fiqh Card Converter

AI-Powered Extraction: Uses Claude Opus 4.1 to intelligently parse complex Arabic fiqh texts
Structured JSON Output: Converts unstructured DOCX tables into well-organized JSON
Handles Multiple Opinions: Correctly separates and attributes different scholarly positions
Markdown Preview: View how documents are converted before processing

Installation

Prerequisites

Python 3.11 or higher
UV package manager

Setup

Clone the repository:

git clone [repository-url]
cd indexing

Install dependencies using UV:

uv sync

Set up environment variables: Create a .env file in the project root with your Anthropic API key:

ANTHROPIC_API_KEY=your-api-key-here

Usage

Fiqh Card Converter

The fiqh card converter uses Claude Opus 4.1 to extract structured information from Islamic jurisprudence documents.

Preview Markdown Conversion

To see how a DOCX file will be converted to markdown (without calling the API):

PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli preview

Test on Sample File

To test the converter on the sample file:

PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli test

This will:

Process sample_input_data/fiqh_cards/sample.docx
Save output to sample_output_data/fiqh_cards/sample_claude.json
Display a summary of extracted issues

Convert All Files in Directory

To process all DOCX files in a directory:

PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli convert \
    sample_input_data/fiqh_cards \
    sample_output_data/fiqh_cards

Qul Tafsir Converter

The Qul Tafsir converter processes tafsir (Quranic commentary) data and uploads it to Agentset for RAG (Retrieval Augmented Generation) applications.

Convert Tafsir to Agentset Format

To generate individual text files with metadata:

uv run python -m qul_tafsir.cli convert-agentset ibn-kathir --start-surah 1 --end-surah 115

Upload to Agentset

To upload all generated files to Agentset:

uv run python -m qul_tafsir.cli ingest-agentset ibn-kathir

This command:

Uploads all section files to S3
Creates a batch ingest job
Monitors job status until completion
Supports checkpoint/resume on failure

Environment variables required:

AGENTSET_API_TOKEN=your-agentset-token
AGENTSET_NAMESPACE_ID=your-namespace-id

Remove Duplicates

If you accidentally upload duplicates, you can clean them up:

Preview duplicates (dry run):

uv run python -m qul_tafsir.cli deduplicate-agentset

Actually delete duplicates (keeps oldest by default):

uv run python -m qul_tafsir.cli deduplicate-agentset --no-dry-run

Keep newest instead of oldest:

uv run python -m qul_tafsir.cli deduplicate-agentset --no-dry-run --keep newest

JSON Output Structure

The converter extracts the following information from each fiqh issue:

{
  "issue_number": 1,
  "question": "The main fiqh question",
  "context": "Background and context of the disagreement",
  "opinions": [
    {
      "position": "The opinion/ruling",
      "scholars": ["Scholar1", "Scholar2"]
    }
  ],
  "disagreement_reason": "Why scholars disagree",
  "evidence": {
    "Evidence_1": "Quranic verses and hadiths",
    "Evidence_2": "Additional proofs"
  },
  "preferred_opinion": "The strongest opinion",
  "practical_impact": "Real-world implications",
  "references": "Source references"
}

Project Structure

indexing/
├── src/
│   ├── fiqh_card_converter/
│   │   ├── __init__.py
│   │   ├── claude_converter.py  # Core converter using Claude AI
│   │   └── claude_cli.py        # Command-line interface
│   └── qul_tafsir/              # Tafsir converter (to be ported to Goodmem)
├── sample_input_data/
│   └── fiqh_cards/              # Sample DOCX files
├── sample_output_data/
│   └── fiqh_cards/              # Generated JSON output
├── .env                         # Environment variables (create this)
├── pyproject.toml               # Project configuration
└── README.md                    # This file

How It Works

Table Extraction: Reads tables from DOCX files containing fiqh issues
Markdown Conversion: Converts table data to clean markdown format
AI Processing: Sends markdown to Claude Opus 4.1 with a structured prompt
JSON Generation: Claude extracts and returns structured JSON data
Validation: Ensures all required fields are present in the output

API Requirements

This project requires an Anthropic API key with access to Claude Opus 4.1. The model used is:

Model ID: claude-opus-4-1-20250805
Max tokens: 16384
Temperature: 0 (for consistent output)

Development

Adding New Features

The converter is designed to be extensible. To modify the extraction schema, edit the FIQH_ISSUE_SCHEMA in claude_converter.py.

Testing

Always test changes using the preview command first to see the markdown conversion without consuming API credits:

PYTHONPATH=src uv run python -m fiqh_card_converter.claude_cli preview

TODO

Port qul_tafsir converter from Vectara to Goodmem
Add batch processing with progress tracking
Implement caching to avoid reprocessing files

License

[Add license information here]

Contact

[Add contact information here]

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.claude/agents		.claude/agents
codev		codev
sample_input_data/fiqh_cards		sample_input_data/fiqh_cards
sample_output_data/fiqh_cards		sample_output_data/fiqh_cards
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ansari Indexing - Islamic Text Converters

Overview

Features

Fiqh Card Converter

Installation

Prerequisites

Setup

Usage

Fiqh Card Converter

Preview Markdown Conversion

Test on Sample File

Convert All Files in Directory

Qul Tafsir Converter

Convert Tafsir to Agentset Format

Upload to Agentset

Remove Duplicates

JSON Output Structure

Project Structure

How It Works

API Requirements

Development

Adding New Features

Testing

TODO

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

ansari-project/indexing

Folders and files

Latest commit

History

Repository files navigation

Ansari Indexing - Islamic Text Converters

Overview

Features

Fiqh Card Converter

Installation

Prerequisites

Setup

Usage

Fiqh Card Converter

Preview Markdown Conversion

Test on Sample File

Convert All Files in Directory

Qul Tafsir Converter

Convert Tafsir to Agentset Format

Upload to Agentset

Remove Duplicates

JSON Output Structure

Project Structure

How It Works

API Requirements

Development

Adding New Features

Testing

TODO

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages