A secure, AI-powered video generation platform that creates videos from JSON configurations using FFmpeg and OpenAI Whisper.
VideoCraft transforms JSON configurations into complete videos with:
- Automatic scene composition from multiple media sources (video, audio, images)
- AI-powered progressive subtitles with word-level timing precision
- Security-first architecture with comprehensive input validation and CSRF protection
- Production-ready deployment with Docker and Kubernetes support
Unlike traditional subtitle systems, VideoCraft uses OpenAI Whisper to generate word-by-word timing for progressive subtitle animations. Each word appears precisely when spoken, creating engaging, TikTok-style subtitle effects.
"subtitles": {
"style": "progressive",
"settings": {
"font_family": "Arial Black",
"font_size": 32,
"word_color": "#FFFFFF",
"outline_color": "#000000"
}
}- Go 1.24+
- FFmpeg (with libx264)
- Python 3.8+ with pip
- Docker (recommended)
git clone https://github.com/activadee/videocraft.git
cd videocraft
# Install Python dependencies for Whisper
pip install -r scripts/requirements.txt
# Build the application
make build# Set allowed domains for CORS (required for web clients)
export VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS="localhost:3000,yourdomain.com"
# Generate API key (or use auto-generated)
export VIDEOCRAFT_SECURITY_API_KEY="your-secure-api-key"./videocraft
# Server starts on http://localhost:3002# Create a simple video configuration
cat > example.json << 'EOF'
[
{
"comment": "My first VideoCraft video",
"resolution": "1920x1080",
"quality": "medium",
"scenes": [
{
"id": "intro",
"background-color": "#1a1a1a",
"elements": [
{
"type": "audio",
"src": "https://example.com/your-audio.mp3"
},
{
"type": "subtitles",
"settings": {
"style": "progressive",
"font_family": "Arial Black",
"font_size": 36,
"word_color": "#FFFFFF",
"outline_color": "#FF6B6B"
}
}
]
}
]
}
]
EOF
# Generate video
curl -X POST http://localhost:3002/api/v1/videos \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-secure-api-key" \
-d @example.json
# Check job status (use job_id from response)
curl http://localhost:3002/api/v1/jobs/{job_id} \
-H "Authorization: Bearer your-secure-api-key"
# Download completed video
curl http://localhost:3002/api/v1/videos/{video_id} \
-H "Authorization: Bearer your-secure-api-key" \
-o output.mp4VideoCraft uses an array-based JSON format supporting multiple projects:
[
{
"comment": "Project description",
"resolution": "1920x1080",
"quality": "high",
"scenes": [
{
"id": "scene1",
"background-color": "#000000",
"elements": [
{
"type": "video",
"src": "https://example.com/background.mp4",
"volume": 0.3,
"z-index": -1
},
{
"type": "audio",
"src": "https://example.com/narration.mp3",
"volume": 1.0
},
{
"type": "image",
"src": "https://example.com/logo.png",
"x": 100,
"y": 50,
"z-index": 10
},
{
"type": "subtitles",
"settings": {
"style": "progressive",
"font_family": "Impact",
"font_size": 48,
"word_color": "#FFFF00",
"outline_color": "#000000",
"position": "center-bottom"
}
}
]
}
],
"elements": [
// Global elements applied to all scenes
]
}
]Video Elements
{
"type": "video",
"src": "https://example.com/video.mp4",
"x": 0, "y": 0,
"volume": 0.5,
"z-index": 1
}Audio Elements
{
"type": "audio",
"src": "https://example.com/audio.mp3",
"volume": 1.0,
"duration": 30.5
}Image Elements
{
"type": "image",
"src": "https://example.com/image.png",
"x": 100, "y": 200,
"z-index": 5
}Subtitle Elements
{
"type": "subtitles",
"settings": {
"style": "progressive", // progressive or classic
"font_family": "Arial Black",
"font_size": 32,
"word_color": "#FFFFFF",
"outline_color": "#000000",
"position": "center-bottom" // top, center, bottom
}
}All endpoints require Bearer token authentication:
Authorization: Bearer YOUR_API_KEY
Create Video Generation Job
POST /api/v1/videos
Content-Type: application/json
Body: JSON array of video configurations
Response: {"job_id": "uuid", "status": "pending"}
Get Job Status
GET /api/v1/jobs/{job_id}
Response: {
"job_id": "uuid",
"status": "completed|processing|failed|pending",
"progress": 85,
"video_id": "uuid",
"error": "error message if failed"
}
Download Video
GET /api/v1/videos/{video_id}
Response: MP4 video file
Health Check
GET /health
Response: {"status": "healthy", "timestamp": "2024-01-01T12:00:00Z"}
CSRF Token (if CSRF enabled)
GET /api/v1/csrf-token
Response: {"csrf_token": "secure-token"}
VideoCraft implements comprehensive security measures:
- No wildcard origins - explicit domain allowlisting required
- Secure credentials handling with proper origin validation
- Request method restrictions to approved HTTP methods
# Required: Configure allowed domains
export VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS="localhost:3000,yourdomain.com"- Token-based validation for state-changing requests
- Secure token generation with cryptographic randomness
- Optional but recommended for production environments
export VIDEOCRAFT_SECURITY_ENABLE_CSRF=true
export VIDEOCRAFT_SECURITY_CSRF_SECRET="your-secure-secret"- URL validation prevents SSRF attacks
- File size limits prevent resource exhaustion
- Media format validation ensures safe file processing
- Command injection protection for FFmpeg operations
- Sanitized error responses prevent information disclosure
- Structured logging for security event monitoring
- Rate limiting prevents abuse and DoS attacks
graph TB
Client[Web Client] --> API[HTTP API :3002]
API --> Auth[Security Middleware]
Auth --> Jobs[Job Queue]
Jobs --> Audio[Audio Service]
Jobs --> Whisper[Whisper Daemon]
Jobs --> FFmpeg[FFmpeg Service]
Jobs --> Storage[File Storage]
Audio --> Probe[FFprobe Analysis]
Whisper --> AI[OpenAI Whisper]
FFmpeg --> Encoder[Video Encoder]
Storage --> Files[Local Filesystem]
- HTTP API: Gin web framework with security middleware
- Job Queue: Async processing with worker pools
- Audio Service: Duration analysis and metadata extraction
- Whisper Daemon: Persistent Python process for AI transcription
- FFmpeg Service: Secure video composition and encoding
- Storage Service: File management with cleanup policies
VideoCraft supports comprehensive configuration via config.yaml:
server:
host: "0.0.0.0"
port: 3002
ffmpeg:
binary_path: "ffmpeg"
timeout: "1h"
quality: 23 # CRF value (lower = better quality)
preset: "medium" # Encoding speed
transcription:
enabled: true
daemon:
enabled: true
idle_timeout: "300s" # Shutdown after 5min idle
startup_timeout: "120s" # Max startup time
restart_max_attempts: 3
python:
path: "python3"
model: "base" # tiny/base/small/medium/large
language: "auto" # Auto-detect or specific language
device: "cpu" # cpu/cuda
subtitles:
enabled: true
style: "progressive" # progressive/classic
font_family: "Arial"
font_size: 24
position: "center-bottom"
colors:
word: "#FFFFFF"
outline: "#000000"
storage:
output_dir: "./generated_videos"
temp_dir: "./temp"
max_file_size: 1073741824 # 1GB limit
retention_days: 7
job:
workers: 4 # Concurrent job workers
queue_size: 100 # Max queued jobs
max_concurrent: 10 # Max concurrent jobs
security:
rate_limit: 100 # Requests per minute
enable_auth: true # API key authentication
api_key: "" # Auto-generated if empty
enable_csrf: false # CSRF protection
allowed_domains: [] # CORS allowed originsversion: '3.8'
services:
videocraft:
build: .
ports:
- "3002:3002"
environment:
- VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS=localhost:3000,yourdomain.com
- VIDEOCRAFT_SECURITY_API_KEY=your-secure-api-key
volumes:
- ./generated_videos:/app/generated_videos
- ./cache:/app/cache
security_opt:
- no-new-privileges:true
user: "1000:1000"
read_only: true
tmpfs:
- /tmp:size=1G,noexec,nosuid,nodev# Start the service
docker-compose up -d
# Check logs
docker-compose logs -f videocraft
# Stop the service
docker-compose down# Build image
docker build -t videocraft .
# Run container
docker run -d \
--name videocraft \
-p 3002:3002 \
-e VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS="localhost:3000" \
-e VIDEOCRAFT_SECURITY_API_KEY="your-key" \
-v $(pwd)/generated_videos:/app/generated_videos \
videocraftvideocraft/
├── cmd/videocraft/ # Application entry point
├── internal/
│ ├── api/ # HTTP handlers and middleware
│ ├── core/ # Business logic and services
│ ├── app/ # Configuration management
│ ├── pkg/ # Shared utilities (logging, errors)
│ └── storage/ # File storage backend
├── scripts/ # Python Whisper daemon
├── config/ # Configuration files
└── docs/ # Technical documentation
# Install dependencies
go mod download
pip install -r scripts/requirements.txt
# Development build
make build
# Run tests
make test
# Run with live reload (requires air)
make dev
# Security scan
make security
# Generate coverage report
make coverage
# Clean build artifacts
make cleanRequired for Web Clients:
export VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS="localhost:3000,yourdomain.com"Optional Security:
export VIDEOCRAFT_SECURITY_API_KEY="your-secure-api-key"
export VIDEOCRAFT_SECURITY_ENABLE_CSRF=true
export VIDEOCRAFT_SECURITY_CSRF_SECRET="your-csrf-secret"Server Configuration:
export VIDEOCRAFT_SERVER_HOST="0.0.0.0"
export VIDEOCRAFT_SERVER_PORT=3002Storage Configuration:
export VIDEOCRAFT_STORAGE_OUTPUT_DIR="./generated_videos"
export VIDEOCRAFT_STORAGE_TEMP_DIR="./temp"Whisper Configuration:
export VIDEOCRAFT_TRANSCRIPTION_PYTHON_MODEL="base"
export VIDEOCRAFT_TRANSCRIPTION_PYTHON_DEVICE="cpu"Server won't start
# Check port availability
lsof -i :3002
# Verify FFmpeg installation
ffmpeg -version
# Check Python dependencies
python -c "import whisper; print('Whisper OK')"CORS errors in browser
# Add your domain to allowed list
export VIDEOCRAFT_SECURITY_ALLOWED_DOMAINS="localhost:3000,yourdomain.com"
# Check browser console for specific error
# Look for server logs: docker-compose logs videocraft | grep CORSWhisper daemon fails
# Test Whisper manually
python scripts/whisper_daemon.py
# Check available models
python -c "import whisper; print(whisper.available_models())"
# For GPU support, install CUDA version of PyTorch
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118Video generation fails
# Verify media URLs are accessible
curl -I "https://your-media-url.com/audio.mp3"
# Check FFmpeg can process your media
ffprobe "https://your-media-url.com/audio.mp3"
# Monitor job status for detailed error messages
curl http://localhost:3002/api/v1/jobs/{job_id}Out of memory errors
# Use smaller Whisper model
export VIDEOCRAFT_TRANSCRIPTION_PYTHON_MODEL="tiny"
# Reduce concurrent jobs
export VIDEOCRAFT_JOB_MAX_CONCURRENT=2
# Increase Docker memory limit
docker run --memory=4g videocraft# Test API connectivity
curl http://localhost:3002/health
# Get CSRF token
curl http://localhost:3002/api/v1/csrf-token
# Test authentication
curl -H "Authorization: Bearer your-key" http://localhost:3002/health
# Monitor logs
docker-compose logs -f videocraft | grep ERROR- CPU: 2+ cores (FFmpeg encoding is CPU-intensive)
- Memory: 4GB+ (Whisper models require significant RAM)
- Storage: SSD recommended for video I/O
- Network: High bandwidth for external media downloads
- Use smaller Whisper models (tiny/base) for faster processing
- Enable GPU acceleration if available (CUDA support)
- Implement Redis for job queue in multi-instance deployments
- Use CDN for frequently accessed media files
- Configure FFmpeg presets based on quality vs speed requirements
- Horizontal scaling: Multiple VideoCraft instances behind load balancer
- Dedicated workers: Separate transcription and video processing services
- External storage: S3/MinIO for generated videos
- Queue backend: Redis or RabbitMQ for job distribution
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Make your changes with tests
- Run the full test suite (
make test) - Submit a pull request
- Follow Go best practices and idioms
- Add unit tests for new functionality
- Update documentation for API changes
- Use conventional commit messages
- Ensure security validations for user inputs
Built with Go, FFmpeg, and OpenAI Whisper