Maivi - My AI Voice Input 🎤

Real-time voice-to-text transcription with hotkey support

Maivi (My AI Voice Input) is a cross-platform desktop application that turns your voice into text using state-of-the-art AI models. Simply press Alt+Q to start recording, and press again to stop. Your transcription appears in real-time and is automatically copied to your clipboard.

✨ Features

🎤 Hotkey Recording - Toggle recording with Alt+Q
⚡ Real-time Transcription - See text appear as you speak
📋 Clipboard Integration - Automatic copy to clipboard
🪟 Floating Overlay - Live transcription in a sleek overlay window
🔄 Smart Chunk Merging - Advanced overlap-based merging eliminates duplicates
💻 CPU-Only - No GPU required (though GPU acceleration is supported)
🌍 High Accuracy - Powered by NVIDIA Parakeet TDT 0.6B model (~6-9% WER)
🚀 Fast - ~0.36x RTF (processes 7s audio in 2.5s on CPU)

🚀 Quick Start

Installation

CPU-only (Recommended - much faster, 100MB vs 2GB+):

pip install maivi --extra-index-url https://download.pytorch.org/whl/cpu

Or with GPU support (if you have NVIDIA GPU):

pip install maivi --extra-index-url https://download.pytorch.org/whl/cu121

Standard install (may download large CUDA files):

pip install maivi

System Requirements

Linux:

sudo apt-get install portaudio19-dev python3-pyaudio

macOS:

brew install portaudio

Windows:

PortAudio is usually included with PyAudio

Usage

GUI Mode (Recommended):

maivi

Press Alt+Q to start recording, press Alt+Q again to stop. The transcription will appear in a floating overlay and be copied to your clipboard.

CLI Mode:

# Basic CLI
maivi-cli

# With live terminal UI
maia-cli --show-ui

# Custom parameters
maia-cli --window 10 --slide 5 --show-ui

Controls:

Alt+Q - Start/stop recording (toggle mode)
Esc - Exit application

📖 How It Works

Maia uses a sophisticated streaming architecture:

Sliding Window Recording - Captures audio in overlapping 7-second chunks every 3 seconds
Real-time Transcription - Each chunk is transcribed by the NVIDIA Parakeet model
Smart Merging - Chunks are merged using overlap detection (4-second overlap)
Live Updates - The UI updates in real-time as transcription progresses

Why Overlapping Chunks?

Chunk 1: "hello world how are you"
Chunk 2: "how are you doing today"
          ^^^^^^^^^^^^^^
          Overlap detected → merge!

Result: "hello world how are you doing today"

This approach ensures:

✅ No words cut mid-syllable
✅ Context preserved for better accuracy
✅ Seamless merging without duplicates
✅ Fast processing (no queue buildup)

⚙️ Configuration

Chunk Parameters

maia-cli --window 7.0 --slide 3.0 --delay 2.0

--window: Chunk size in seconds (default: 7.0)
- Larger = better quality, slower processing
--slide: Slide interval in seconds (default: 3.0)
- Smaller = more overlap, higher CPU usage
- Rule: Must be > window × 0.36 to avoid queue buildup
--delay: Processing start delay in seconds (default: 2.0)

Advanced Options

# Speed adjustment (experimental)
maia-cli --speed 1.5

# Custom UI width
maia-cli --show-ui --ui-width 50

# Disable pause detection
maia-cli --no-pause-breaks

# Stream to file (for voice commands)
maia-cli --output-file transcription.txt

📦 Building Executables

Maivi can be packaged as standalone executables for easy distribution:

# Install build dependencies
pip install maivi[build]

# Build executable
pyinstaller --onefile --windowed \
  --name maivi \
  --add-data "src/maia:maia" \
  src/maia/__main__.py

Pre-built executables are available in Releases.

🏗️ Development

Setup Development Environment

# Clone repository
git clone https://github.com/MaximeRivest/maivi.git
cd maivi

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

Project Structure

maia/
├── src/maia/
│   ├── __init__.py
│   ├── __main__.py           # GUI entry point
│   ├── core/
│   │   ├── streaming_recorder.py
│   │   ├── chunk_merger.py
│   │   └── pause_detector.py
│   ├── gui/
│   │   └── qt_gui.py
│   ├── cli/
│   │   ├── cli.py
│   │   ├── server.py
│   │   └── terminal_ui.py
│   └── utils/
├── tests/
├── docs/
├── pyproject.toml
├── README.md
└── LICENSE

🐛 Troubleshooting

"No overlap found" warnings

This is expected behavior when there are long pauses (5+ seconds of silence). The system adds "..." gap markers to indicate the pause.

Queue buildup (transcription continues after stopping)

Check that processing time < slide interval:

Processing: window_seconds × 0.36 (RTF)
Should be < slide_seconds
Default: 7 × 0.36 = 2.52s < 3s ✅

Model download issues

The first run downloads the NVIDIA Parakeet model (~600MB) from HuggingFace. If download fails:

Check internet connection
Verify HuggingFace is accessible
Clear cache: rm -rf ~/.cache/huggingface/

Qt/GUI crashes

If the GUI crashes on Linux:

# Check Qt installation
python -c "from PySide6 import QtWidgets; print('Qt OK')"

# Fall back to CLI mode
maia-cli --show-ui

📊 Performance

Memory:

Model: ~2GB RAM
Audio buffer: ~1MB
Total: ~2.5GB RAM

CPU:

Idle: <5% CPU
Recording: 30-40% of 1 core
Transcription: 100% of 1 core (during processing)

Latency:

First transcription: 2s (start delay)
Updates: Every 3s (slide interval)
Completion: 1-3s after recording stops

Accuracy:

Model WER: ~5-8%
Overlap merging: <1% word loss
Total effective WER: ~6-9%

🗺️ Roadmap

v0.2 - Platform Support:

Test and verify macOS support
Test and verify Windows support
Platform-specific installers (.app, .exe)

v0.3 - Features:

Configurable hotkeys via GUI
Multi-language support
Custom model selection
Voice commands support

v0.4 - Optimization:

GPU acceleration (CUDA)
Export formats (JSON, SRT)
Text editor integration
Plugin system

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built with NVIDIA NeMo ASR toolkit
Uses Parakeet TDT 0.6B model
GUI powered by PySide6

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

💬 Support

Made with ❤️ by Maxime Rivest

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
docs		docs
src/maivi		src/maivi
tests		tests
.gitignore		.gitignore
ADD_WORKFLOWS.sh		ADD_WORKFLOWS.sh
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NEXT_STEPS.md		NEXT_STEPS.md
README.md		README.md
SETUP_INSTRUCTIONS.md		SETUP_INSTRUCTIONS.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Maivi - My AI Voice Input 🎤

✨ Features

🚀 Quick Start

Installation

System Requirements

Usage

📖 How It Works

Why Overlapping Chunks?

⚙️ Configuration

Chunk Parameters

Advanced Options

📦 Building Executables

🏗️ Development

Setup Development Environment

Project Structure

🐛 Troubleshooting

"No overlap found" warnings

Queue buildup (transcription continues after stopping)

Model download issues

Qt/GUI crashes

📊 Performance

🗺️ Roadmap

📄 License

🙏 Acknowledgments

🤝 Contributing

💬 Support

About

Uh oh!

Releases

Packages

Languages

License

WTFoss/maivi

Folders and files

Latest commit

History

Repository files navigation

Maivi - My AI Voice Input 🎤

✨ Features

🚀 Quick Start

Installation

System Requirements

Usage

📖 How It Works

Why Overlapping Chunks?

⚙️ Configuration

Chunk Parameters

Advanced Options

📦 Building Executables

🏗️ Development

Setup Development Environment

Project Structure

🐛 Troubleshooting

"No overlap found" warnings

Queue buildup (transcription continues after stopping)

Model download issues

Qt/GUI crashes

📊 Performance

🗺️ Roadmap

📄 License

🙏 Acknowledgments

🤝 Contributing

💬 Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages