A tool for translating videos between different languages with automatic transcription, translation, and voice synthesis.
- Features
- Prerequisites
- Installation
- Usage
- How It Works
- Models and Voices
- Supported Languages
- Output Structure
- Requirements
- Troubleshooting
- License
- Acknowledgments
- 🎤 Automatic video transcription using OpenAI's Whisper
- 🌐 Multi-language translation using M2M100
- 🔊 Text-to-speech with gTTS
- 🎵 Optional RVC voice conversion
- 🌍 Support for multiple languages
- 💾 Progress saving and resuming
- ⏱️ Automatic audio timing synchronization
- Python 3.8 or higher
- FFmpeg installed and added to PATH
- Internet connection (for translation and TTS)
- CUDA-capable GPU (recommended for RVC (but u can use it project without RVC))
- Clone the repository:
git clone https://github.com/yourusername/video-translator.git
cd video-translator
- Create and activate virtual environment:
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Install FFmpeg (required for audio/video processing):
- Windows: Download from ffmpeg.org and add to PATH
- Linux:
sudo apt-get install ffmpeg
- macOS:
brew install ffmpeg
Basic usage:
- Place your video file in the project directory
path/to/video.mp4
- Run the script:
python main.py "path/to/video.mp4"
# or for better transcribe and translate
python main.py '.\output\The_Best_Way_to_Learn_Linux\The Best Way to Learn Linux.webm' -s en -t ru -g male -w medium -tr m2m100_1.2B
# or for using RVC models
python main.py your_video.mp4 --rvc-model "models/rvc/male/ru/drevnyirus.pth -s en -t ru -g male -w medium -tr m2m100_1.2B"
# Use GPU if available
python main.py '.\output\The_Best_Way_to_Learn_Linux\The Best Way to Learn Linux.webm' -s en -t ru -g male -w base --use-gpu
With language options:
python main.py "path/to/video.mp4" --source-lang en --target-lang ru
# Disable RVC
python main.py "path/to/video.mp4" --no-rvc
# Use specific RVC model
python main.py "path/to/video.mp4" --rvc-model "models/rvc/your_model"
Basic usage:
python main.py "path/to/video.mp4"
The script will create an organized output structure:
output/
└── video_name/
├── tts-chunks/ # Individual TTS audio chunks
│ ├── video_name_0000.mp3
│ ├── video_name_0001.mp3
│ └── ...
├── transcript.txt # Original transcription
├── translated.txt # Translated text
├── audio_dubbed.mp3 # Combined dubbed audio
└── video_name_dubbed.mp4 # Final video with dubs
Main process in project:
- Transcription: Uses Whisper to convert speech to text
- Translation: Translates the text using M2M100 model
- TTS Generation: Creates audio using gTTS
- Audio Processing: Adjusts audio timing to match video
- Video Creation: Combines original video with new audio
The script saves progress at each step:
- If
transcript.txt
exists, skips transcription - If
translated.txt
exists, skips translation - If TTS chunks exist, skips TTS generation
- If final files exist, skips final processing
To force reprocessing, delete the corresponding files.
All models are automatically downloaded to the models/
directory in your project folder when first used. This includes:
- Whisper models (tiny, base, small, medium, large)
- M2M100 translation models
- NLLB translation models
The models are downloaded only once and reused for subsequent runs. You can find them in:
models/
├── whisper/ # Whisper transcription models
├── m2m100/ # M2M100 translation models
├── nllb/ # NLLB translation models
└── rvc/ # RVC voice conversion models (if used)
├── male/
│ └── ru/
│ ├── added_drevnyirus_v2.index
│ └── drevnyirus.pth
└── female/
The default model is "base", but you can use different Whisper models for better accuracy:
Model | Size | RAM | Speed | Quality |
---|---|---|---|---|
tiny | 1GB | ~1GB | Fastest | Basic |
base | 1GB | ~1GB | Fast | Good |
small | 2GB | ~2GB | Medium | Better |
medium | 5GB | ~5GB | Slow | Great |
large | 10GB | ~10GB | Slowest | Best |
To change the Whisper model:
def transcribe_video(video_path, transcript_path, source_lang='en'):
print("🔍 Loading Whisper model...")
# Change "base" to any of: "tiny", "base", "small", "medium", "large"
model = whisper.load_model("base")
- Automatic voice selection based on target language
- Natural-sounding voices for each supported language
- No additional configuration needed
- Create a
models/rvc/
directory - Add your RVC model files (
.pth
and.index
) - Update the model path:
rvc = RVCConverter("models/rvc/your_model_name")
Available RVC models:
- Male voices: Add your male voice model files
- Female voices: Add your female voice model files
- English (en)
- Russian (ru)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Japanese (ja)
- Korean (ko)
- Chinese (zh)
- Python 3.8+
- FFmpeg
- CUDA-capable GPU (recommended for RVC)
- See
requirements.txt
for Python dependencies
-
FFmpeg not found
- Install FFmpeg and add it to your system PATH
- Verify installation:
ffmpeg -version
-
Translation Quality
- Try different Whisper models for better transcription
- Check if the source language is correctly set
-
Voice Quality
- Use RVC for better voice quality
- Try different RVC models for different voices
-
GPU Issues
- Ensure CUDA is properly installed
- Check GPU memory usage
- Try smaller models if out of memory
MIT License
- OpenAI Whisper for speech recognition
- gTTS for text-to-speech
- FFmpeg for video processing
- M2M100 for translation