Convert text to speech using the ElevenLabs client. Ensure rawtext.md is updated, all environment keys are loaded, and outputs are available in .mp3, .txt, and .log formats.
You can see sample output here: https://github.com/smol-ai/temp
- make sure ffmpeg is installed
brew install imagemagick
-
Install Dependencies:
pip install -r requirements.txt
-
Set Environment Variables:
export ELEVENLABS_API_KEY=your_api_key export OPENAI_API_KEY=your_api_key export CARTESIA_API_KEY=your_api_key # note that this thing generates a lot of tokens. we used up 52k cahracters just developing this.
-
Update
rawtext.md:- Add or modify the text you want to convert to speech.
-
Run the Script:
python main.py
To generate a video from the audio and transcript, you can use the video.py script. This script uses the MoviePy library to combine the audio and image, and the OpenAI library to generate a default image using DALL·E.
- Make sure you have the
moviepyandopenailibraries installed. You can install them using pip:pip install moviepy openai
-
Set Environment Variables:
export OPENAI_API_KEY=your_api_key -
Run the Script:
python video.py
- Video:
final_video.mp4file
- The
video.pyscript assumes that thecombined_dialogue.mp3anddialogue_transcript.txtfiles are present in the same directory. - The script generates a default image using DALL·E and resizes it to 1080x1080 pixels.
- The script combines the audio and image to create a video, and adds captions using the transcript.
- The final video is saved as
final_video.mp4in the same directory.
- Audio:
.mp3files - Transcript:
.txtfiles - Logs:
.logfiles