Video Translator

A tool for translating videos between different languages with automatic transcription, translation, and voice synthesis.

Features

🎤 Automatic video transcription using OpenAI's Whisper
🌐 Multi-language translation using M2M100
🔊 Text-to-speech with gTTS
🎵 Optional RVC voice conversion
🌍 Support for multiple languages
💾 Progress saving and resuming
⏱️ Automatic audio timing synchronization

Prerequisites

Python 3.8 or higher
FFmpeg installed and added to PATH
Internet connection (for translation and TTS)
CUDA-capable GPU (recommended for RVC (but u can use it project without RVC))

Installation

Clone the repository:

git clone https://github.com/yourusername/video-translator.git
cd video-translator

Create and activate virtual environment:

python -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Install FFmpeg (required for audio/video processing):

Windows: Download from ffmpeg.org and add to PATH
Linux: sudo apt-get install ffmpeg
macOS: brew install ffmpeg

Usage

Basic usage:

Place your video file in the project directory path/to/video.mp4
Run the script:

python main.py "path/to/video.mp4"

# or for better transcribe and translate
python main.py '.\output\The_Best_Way_to_Learn_Linux\The Best Way to Learn Linux.webm' -s en -t ru -g male -w medium -tr m2m100_1.2B

# or for using RVC models
python main.py your_video.mp4 --rvc-model "models/rvc/male/ru/drevnyirus.pth -s en -t ru -g male -w medium -tr m2m100_1.2B"

# Use GPU if available
python main.py '.\output\The_Best_Way_to_Learn_Linux\The Best Way to Learn Linux.webm' -s en -t ru -g male -w base --use-gpu

With language options:

python main.py "path/to/video.mp4" --source-lang en --target-lang ru

RVC Options (u can eneble or disable it)

# Disable RVC
python main.py "path/to/video.mp4" --no-rvc

# Use specific RVC model
python main.py "path/to/video.mp4" --rvc-model "models/rvc/your_model"

How It Works

Basic usage:

python main.py "path/to/video.mp4"

The script will create an organized output structure:

output/
└── video_name/
    ├── tts-chunks/           # Individual TTS audio chunks
    │   ├── video_name_0000.mp3
    │   ├── video_name_0001.mp3
    │   └── ...
    ├── transcript.txt        # Original transcription
    ├── translated.txt        # Translated text
    ├── audio_dubbed.mp3      # Combined dubbed audio
    └── video_name_dubbed.mp4 # Final video with dubs

Main process in project:

Transcription: Uses Whisper to convert speech to text
Translation: Translates the text using M2M100 model
TTS Generation: Creates audio using gTTS
Audio Processing: Adjusts audio timing to match video
Video Creation: Combines original video with new audio

The script saves progress at each step:

If transcript.txt exists, skips transcription
If translated.txt exists, skips translation
If TTS chunks exist, skips TTS generation
If final files exist, skips final processing

To force reprocessing, delete the corresponding files.

Models and Voices

Model Downloads

All models are automatically downloaded to the models/ directory in your project folder when first used. This includes:

Whisper models (tiny, base, small, medium, large)
M2M100 translation models
NLLB translation models

The models are downloaded only once and reused for subsequent runs. You can find them in:

models/
├── whisper/          # Whisper transcription models
├── m2m100/          # M2M100 translation models
├── nllb/            # NLLB translation models
└── rvc/             # RVC voice conversion models (if used)
    ├── male/
    │   └── ru/
    │       ├── added_drevnyirus_v2.index
    │       └── drevnyirus.pth
    └── female/

Whisper Models

The default model is "base", but you can use different Whisper models for better accuracy:

Model	Size	RAM	Speed	Quality
tiny	1GB	~1GB	Fastest	Basic
base	1GB	~1GB	Fast	Good
small	2GB	~2GB	Medium	Better
medium	5GB	~5GB	Slow	Great
large	10GB	~10GB	Slowest	Best

To change the Whisper model:

def transcribe_video(video_path, transcript_path, source_lang='en'):
    print("🔍 Loading Whisper model...")
    # Change "base" to any of: "tiny", "base", "small", "medium", "large"
    model = whisper.load_model("base")

Voice Selection

gTTS Voices

Automatic voice selection based on target language
Natural-sounding voices for each supported language
No additional configuration needed

RVC Voice Conversion

Create a models/rvc/ directory
Add your RVC model files (.pth and .index)
Update the model path:

rvc = RVCConverter("models/rvc/your_model_name")

Available RVC models:

Male voices: Add your male voice model files
Female voices: Add your female voice model files

Supported Languages

English (en)
Russian (ru)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Japanese (ja)
Korean (ko)
Chinese (zh)

Requirements

Python 3.8+
FFmpeg
CUDA-capable GPU (recommended for RVC)
See requirements.txt for Python dependencies

Troubleshooting

Common Issues

FFmpeg not found
- Install FFmpeg and add it to your system PATH
- Verify installation: ffmpeg -version
Translation Quality
- Try different Whisper models for better transcription
- Check if the source language is correctly set
Voice Quality
- Use RVC for better voice quality
- Try different RVC models for different voices
GPU Issues
- Ensure CUDA is properly installed
- Check GPU memory usage
- Try smaller models if out of memory

License

MIT License

Acknowledgments

OpenAI Whisper for speech recognition
gTTS for text-to-speech
FFmpeg for video processing
M2M100 for translation

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Translator

Table of Contents

Features

Prerequisites

Installation

Usage

RVC Options (u can eneble or disable it)

How It Works

Models and Voices

Model Downloads

Whisper Models

Voice Selection

gTTS Voices

RVC Voice Conversion

Supported Languages

Requirements

Troubleshooting

Common Issues

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

nigelp/videoTranslator

Folders and files

Latest commit

History

Repository files navigation

Video Translator

Table of Contents

Features

Prerequisites

Installation

Usage

RVC Options (u can eneble or disable it)

How It Works

Models and Voices

Model Downloads

Whisper Models

Voice Selection

gTTS Voices

RVC Voice Conversion

Supported Languages

Requirements

Troubleshooting

Common Issues

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages