Persian Speech-to-Text FastAPI Backend

A high-performance FastAPI backend service for Persian speech recognition using the jonatasgrosman/wav2vec2-large-xlsr-53-persian model.

Features

🚀 FastAPI: Modern, fast web framework with automatic API documentation
🇮🇷 Persian Language Support: Specialized for Persian speech recognition
🎯 High Accuracy: 30.12% WER and 7.37% CER on Common Voice Persian test set
📱 Mobile Ready: Optimized for React Native integration
🔄 Async Support: Full async/await support for better performance
📊 Confidence Scoring: Provides transcription confidence levels
🎵 Multiple Formats: Supports WAV, MP3, M4A, FLAC, OGG, AAC
📖 Auto Documentation: Interactive API docs at /docs
🔧 GPU Support: Automatic GPU detection and utilization

Installation

Install Python 3.10+

python --version  # Should be 3.10 or higher

Create virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Quick Start

Start the server
```
python main.py
# or
python run.py
```
Access the API
- Server: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc

API Endpoints

Health Check

GET /health
Returns service status and model information

Transcribe Audio File

POST /transcribe
Upload audio file via multipart/form-data
Content-Type: multipart/form-data
Parameter: audio (file upload)

Transcribe Base64 Audio

POST /transcribe-base64
Send base64 encoded audio data
Content-Type: application/json
Body: {"audio_base64": "base64_encoded_audio"}

Transcribe from URL

POST /transcribe-url
Transcribe audio from a URL (for testing)
Content-Type: application/json
Body: {"audio_url": "https://example.com/audio.wav"}

Response Format

{
  "success": true,
  "transcription": "متن تبدیل شده به فارسی",
  "confidence": 0.95,
  "language": "persian",
  "model": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
  "audio_duration": 5.2,
  "processing_time": 1.8
}

Audio Requirements

Sample Rate: Automatically converted to 16kHz
Duration: 0.5 - 30 seconds
Formats: WAV, MP3, M4A, FLAC, OGG, AAC
Quality: Higher quality audio produces better results

Performance

First Request: ~5-10 seconds (model loading)
Subsequent Requests: ~1-3 seconds
GPU Acceleration: Automatic if CUDA is available
Concurrent Requests: Supports multiple simultaneous requests

Testing

Run the test suite:

python test_api.py

Server Deployment

Deploy to Server (65.21.115.188)

./deploy_to_server.sh

This will:

Install Python 3.10 and dependencies
Create virtual environment
Install FastAPI dependencies
Set up systemd service
Start the service on port 8001

Nginx Configuration

For aiapp.sazjoo.com domain:

# Copy nginx config to server
scp nginx-aiapp-config.conf [email protected]:/etc/nginx/sites-available/aiapp.sazjoo.com

# On server:
ln -s /etc/nginx/sites-available/aiapp.sazjoo.com /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx

Integration with React Native

Update your React Native app's backend URL to:

http://aiapp.sazjoo.com

Example usage in React Native:

const response = await fetch('http://aiapp.sazjoo.com/transcribe-base64', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    audio_base64: base64AudioData,
  }),
});

const result = await response.json();
console.log('Transcription:', result.transcription);

Troubleshooting

Model Loading Issues

Ensure stable internet connection for initial model download
Check available disk space (model is ~1.5GB)
Verify Python version compatibility

Audio Processing Issues

Check audio file format is supported
Ensure audio duration is within limits (0.5-30 seconds)
Verify audio file is not corrupted

Performance Issues

Install CUDA for GPU acceleration
Increase server resources for better performance
Consider using multiple workers in production

Model Information

Model: jonatasgrosman/wav2vec2-large-xlsr-53-persian
Base Model: Facebook's wav2vec2-large-xlsr-53
Training Data: Common Voice 6.1 Persian dataset
Performance: 30.12% WER, 7.37% CER on test set
Input Requirements: 16kHz audio
Output: Persian text transcription

License

This service uses the wav2vec2 model which is subject to its own license terms.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
deploy_to_server.sh		deploy_to_server.sh
main.py		main.py
nginx-aiapp-config.conf		nginx-aiapp-config.conf
requirements.txt		requirements.txt
run.py		run.py
runtime.txt		runtime.txt
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Persian Speech-to-Text FastAPI Backend

Features

Installation

Quick Start

API Endpoints

Health Check

Transcribe Audio File

Transcribe Base64 Audio

Transcribe from URL

Response Format

Audio Requirements

Performance

Testing

Server Deployment

Deploy to Server (65.21.115.188)

Nginx Configuration

Integration with React Native

Troubleshooting

Model Loading Issues

Audio Processing Issues

Performance Issues

Model Information

License

About

Uh oh!

Releases

Packages

Languages

pipinstalled/aiapprecorder

Folders and files

Latest commit

History

Repository files navigation

Persian Speech-to-Text FastAPI Backend

Features

Installation

Quick Start

API Endpoints

Health Check

Transcribe Audio File

Transcribe Base64 Audio

Transcribe from URL

Response Format

Audio Requirements

Performance

Testing

Server Deployment

Deploy to Server (65.21.115.188)

Nginx Configuration

Integration with React Native

Troubleshooting

Model Loading Issues

Audio Processing Issues

Performance Issues

Model Information

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages