Skip to content
/ maivi Public
forked from MaximeRivest/maivi

Maivi - My AI Voice Input: Real-time voice-to-text local on cpu better than whisper with hotkey support

License

Notifications You must be signed in to change notification settings

WTFoss/maivi

Β 
Β 

Repository files navigation

Maivi - My AI Voice Input 🎀

Real-time voice-to-text transcription with hotkey support

Maivi (My AI Voice Input) is a cross-platform desktop application that turns your voice into text using state-of-the-art AI models. Simply press Alt+Q to start recording, and press again to stop. Your transcription appears in real-time and is automatically copied to your clipboard.

License Python Platform

✨ Features

  • 🎀 Hotkey Recording - Toggle recording with Alt+Q
  • ⚑ Real-time Transcription - See text appear as you speak
  • πŸ“‹ Clipboard Integration - Automatic copy to clipboard
  • πŸͺŸ Floating Overlay - Live transcription in a sleek overlay window
  • πŸ”„ Smart Chunk Merging - Advanced overlap-based merging eliminates duplicates
  • πŸ’» CPU-Only - No GPU required (though GPU acceleration is supported)
  • 🌍 High Accuracy - Powered by NVIDIA Parakeet TDT 0.6B model (~6-9% WER)
  • πŸš€ Fast - ~0.36x RTF (processes 7s audio in 2.5s on CPU)

πŸš€ Quick Start

Installation

CPU-only (Recommended - much faster, 100MB vs 2GB+):

pip install maivi --extra-index-url https://download.pytorch.org/whl/cpu

Or with GPU support (if you have NVIDIA GPU):

pip install maivi --extra-index-url https://download.pytorch.org/whl/cu121

Standard install (may download large CUDA files):

pip install maivi

System Requirements

Linux:

sudo apt-get install portaudio19-dev python3-pyaudio

macOS:

brew install portaudio

Windows:

  • PortAudio is usually included with PyAudio

Usage

GUI Mode (Recommended):

maivi

Press Alt+Q to start recording, press Alt+Q again to stop. The transcription will appear in a floating overlay and be copied to your clipboard.

CLI Mode:

# Basic CLI
maivi-cli

# With live terminal UI
maia-cli --show-ui

# Custom parameters
maia-cli --window 10 --slide 5 --show-ui

Controls:

  • Alt+Q - Start/stop recording (toggle mode)
  • Esc - Exit application

πŸ“– How It Works

Maia uses a sophisticated streaming architecture:

  1. Sliding Window Recording - Captures audio in overlapping 7-second chunks every 3 seconds
  2. Real-time Transcription - Each chunk is transcribed by the NVIDIA Parakeet model
  3. Smart Merging - Chunks are merged using overlap detection (4-second overlap)
  4. Live Updates - The UI updates in real-time as transcription progresses

Why Overlapping Chunks?

Chunk 1: "hello world how are you"
Chunk 2: "how are you doing today"
          ^^^^^^^^^^^^^^
          Overlap detected β†’ merge!

Result: "hello world how are you doing today"

This approach ensures:

  • βœ… No words cut mid-syllable
  • βœ… Context preserved for better accuracy
  • βœ… Seamless merging without duplicates
  • βœ… Fast processing (no queue buildup)

βš™οΈ Configuration

Chunk Parameters

maia-cli --window 7.0 --slide 3.0 --delay 2.0
  • --window: Chunk size in seconds (default: 7.0)
    • Larger = better quality, slower processing
  • --slide: Slide interval in seconds (default: 3.0)
    • Smaller = more overlap, higher CPU usage
    • Rule: Must be > window Γ— 0.36 to avoid queue buildup
  • --delay: Processing start delay in seconds (default: 2.0)

Advanced Options

# Speed adjustment (experimental)
maia-cli --speed 1.5

# Custom UI width
maia-cli --show-ui --ui-width 50

# Disable pause detection
maia-cli --no-pause-breaks

# Stream to file (for voice commands)
maia-cli --output-file transcription.txt

πŸ“¦ Building Executables

Maivi can be packaged as standalone executables for easy distribution:

# Install build dependencies
pip install maivi[build]

# Build executable
pyinstaller --onefile --windowed \
  --name maivi \
  --add-data "src/maia:maia" \
  src/maia/__main__.py

Pre-built executables are available in Releases.

πŸ—οΈ Development

Setup Development Environment

# Clone repository
git clone https://github.com/MaximeRivest/maivi.git
cd maivi

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

Project Structure

maia/
β”œβ”€β”€ src/maia/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ __main__.py           # GUI entry point
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ streaming_recorder.py
β”‚   β”‚   β”œβ”€β”€ chunk_merger.py
β”‚   β”‚   └── pause_detector.py
β”‚   β”œβ”€β”€ gui/
β”‚   β”‚   └── qt_gui.py
β”‚   β”œβ”€β”€ cli/
β”‚   β”‚   β”œβ”€β”€ cli.py
β”‚   β”‚   β”œβ”€β”€ server.py
β”‚   β”‚   └── terminal_ui.py
β”‚   └── utils/
β”œβ”€β”€ tests/
β”œβ”€β”€ docs/
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
└── LICENSE

πŸ› Troubleshooting

"No overlap found" warnings

This is expected behavior when there are long pauses (5+ seconds of silence). The system adds "..." gap markers to indicate the pause.

Queue buildup (transcription continues after stopping)

Check that processing time < slide interval:

  • Processing: window_seconds Γ— 0.36 (RTF)
  • Should be < slide_seconds
  • Default: 7 Γ— 0.36 = 2.52s < 3s βœ…

Model download issues

The first run downloads the NVIDIA Parakeet model (~600MB) from HuggingFace. If download fails:

  • Check internet connection
  • Verify HuggingFace is accessible
  • Clear cache: rm -rf ~/.cache/huggingface/

Qt/GUI crashes

If the GUI crashes on Linux:

# Check Qt installation
python -c "from PySide6 import QtWidgets; print('Qt OK')"

# Fall back to CLI mode
maia-cli --show-ui

πŸ“Š Performance

Memory:

  • Model: ~2GB RAM
  • Audio buffer: ~1MB
  • Total: ~2.5GB RAM

CPU:

  • Idle: <5% CPU
  • Recording: 30-40% of 1 core
  • Transcription: 100% of 1 core (during processing)

Latency:

  • First transcription: 2s (start delay)
  • Updates: Every 3s (slide interval)
  • Completion: 1-3s after recording stops

Accuracy:

  • Model WER: ~5-8%
  • Overlap merging: <1% word loss
  • Total effective WER: ~6-9%

πŸ—ΊοΈ Roadmap

v0.2 - Platform Support:

  • Test and verify macOS support
  • Test and verify Windows support
  • Platform-specific installers (.app, .exe)

v0.3 - Features:

  • Configurable hotkeys via GUI
  • Multi-language support
  • Custom model selection
  • Voice commands support

v0.4 - Optimization:

  • GPU acceleration (CUDA)
  • Export formats (JSON, SRT)
  • Text editor integration
  • Plugin system

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ’¬ Support


Made with ❀️ by Maxime Rivest

About

Maivi - My AI Voice Input: Real-time voice-to-text local on cpu better than whisper with hotkey support

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Shell 1.1%