Real-time Air Gesture Recognition Desktop Application
AirCut is a modern desktop application that enables intuitive air gesture recognition using computer vision and machine learning. Draw gestures in the air with your hand and execute custom commands through advanced gesture recognition algorithms.
- Real-time Hand Tracking - Advanced computer vision using Roboflow inference
- Air Gesture Recording - Capture complex gesture trajectories in 3D space
- Template-based Recognition - Create and manage custom gesture templates
- Command Execution - Link gestures to system commands or actions
- Silent Operation - Clean, distraction-free user experience
- High-Performance Streaming - Optimized 25 FPS processing with 240x180 inference resolution
- Local Processing - No cloud dependencies, complete privacy
- Real-time Feedback - Instant visual feedback during gesture recording
- Modern UI - Beautiful dark/light mode interface with smooth animations
- Cross-Platform - Built with Tauri for native desktop performance
- Dynamic Time Warping (DTW) - Advanced gesture matching algorithm
- Confidence Thresholds - Adjustable detection and recognition sensitivity
- Stateless Architecture - Efficient client-server communication
- WebSocket Communication - Real-time bidirectional data exchange
- Frame-by-Frame Processing - Optimized video processing pipeline
AirCut follows a modern client-server architecture designed for performance and maintainability:
┌─────────────────────────────────────────────────────────────┐
│ AirCut Desktop App │
│ (Tauri) │
├─────────────────────────────────────────────────────────────┤
│ Frontend (React + TypeScript) │
│ ├── Video Capture & Streaming │
│ ├── Gesture Visualization │
│ ├── Template Management (Client-side Storage) │
│ └── Real-time UI Updates │
├─────────────────────────────────────────────────────────────┤
│ Backend Communication │
│ ├── WebSocket Camera Stream (/ws/frames) │
│ ├── WebSocket Gesture Recognition (/ws/gestures) │
│ └── REST API Health Checks │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Backend Server │
│ (Python FastAPI) │
├─────────────────────────────────────────────────────────────┤
│ Frame Processing Pipeline │
│ ├── Frame Decoding & Preprocessing │
│ ├── Roboflow ML Inference │
│ ├── Hand Detection │
│ └── Coordinate Normalization │
├─────────────────────────────────────────────────────────────┤
│ Gesture Recognition Engine │
│ ├── Trajectory Processing │
│ ├── DTW-based Template Matching │
│ ├── Confidence Scoring │
│ └── Stateless Recognition Service │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Roboflow Inference API │
│ (Computer Vision ML) │
├─────────────────────────────────────────────────────────────┤
│ Hand Detection Model │
│ ├── Real-time Hand Tracking │
│ ├── Bounding Box Detection │
│ ├── Confidence Scoring │
│ └── Coordinate Extraction │
└─────────────────────────────────────────────────────────────┘
- Video Capture: Frontend captures webcam frames at 30 FPS
- Camera Stream: Compressed frames sent to backend via Camera Stream WebSocket (/ws/frames)
- ML Inference: Backend processes frames using Roboflow API (25 FPS)
- Detection Results: Hand coordinates sent back to frontend via Camera Stream WebSocket
- Gesture Recording: Frontend tracks hand movements when drawing
- Recognition: Recorded trajectories compared against stored templates via Gesture WebSocket (/ws/gestures)
- Command Execution: Matched gestures trigger associated commands
- Tauri - Rust-based desktop app framework
- React 18 - Modern UI library with hooks
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- Zustand - Lightweight state management
- Sonner - Toast notifications (minimal usage)
- FastAPI - High-performance Python web framework
- WebSockets - Real-time bidirectional communication
- OpenCV - Computer vision and image processing
- NumPy - Numerical computing for trajectory processing
- Uvicorn - ASGI server for production performance
- Roboflow - Computer vision platform and inference API
- Inference - Robofflow inference python sdk to run the model in python proccessor
- Dynamic Time Warping (DTW) - Custom gesture matching algorithm
- Trajectory Normalization - Mathematical gesture standardization
- Vite - Fast build tool and dev server
- Rust - Systems programming for Tauri
- Python 3.8+ - Backend runtime environment
Before installing AirCut, ensure you have the following:
- Operating System: Windows 10+, macOS 10.15+, or Linux
- RAM: Minimum 4GB, recommended 8GB+
- Storage: 500MB free space
- Webcam: Built-in or external USB webcam
- Internet: Required for initial setup and ML model downloads
- Roboflow Account - Free account at roboflow.com
- API Key - Generate from your Roboflow dashboard
git clone https://github.com/furkanksl/aircut.git
cd aircut
npm install
cd backend
python3.11 -m venv venv
# Activate virtual environment
source venv/bin/activate # macOS/Linux
# or
.\venv\Scripts\activate # Windows
pip install -r requirements.txt
Create environment configuration:
# Create backend/.env file
cat > backend/.env << EOF
ROBOFLOW_API_KEY=your_roboflow_api_key_here
ROBOFLOW_MODEL_ID=handdetection-qycc7/1
CONFIDENCE_THRESHOLD=0.2
EOF
cd backend
source venv/bin/activate # or .\venv\Scripts\activate on Windows
python3.11 main.py
npm run tauri dev
- Allow camera permissions when prompted
- Wait for backend connection (green indicator)
- Try recording your first gesture!
-
Create Virtual Environment
cd backend python3.11 -m venv venv source venv/bin/activate # macOS/Linux .\venv\Scripts\activate # Windows
-
Install Python Dependencies
pip install --upgrade pip pip install -r requirements.txt pip install inference
-
Verify Installation
python3.11 -c "import cv2, fastapi, inference; print('All dependencies installed successfully!')"
-
Install Node Dependencies
npm install
-
Install Tauri CLI
npm install -g @tauri-apps/cli
-
Verify Rust Installation
rustc --version cargo --version
# Required
ROBOFLOW_API_KEY=your_api_key_here
ROBOFLOW_MODEL_ID=handdetection-qycc7/1
# Optional (with defaults)
CONFIDENCE_THRESHOLD=0.2
SERVER_HOST=127.0.0.1
SERVER_PORT=8000
- Visit roboflow.com and create a free account
- Go to your account settings
- Navigate to the "API" section
- Copy your API key
- Paste it in your
.env
file
-
Start the Application
- Launch both backend and frontend servers
- Ensure camera permissions are granted
- Wait for green connection indicator
-
Record a Gesture
- Click the blue "Play" button or wait for auto-start
- Move your hand in the air to draw a gesture
- The app tracks your movement with visual feedback
- Click "Stop" or remove your hand to finish recording
-
Save Template
- Click "Save" after recording a gesture
- Enter a descriptive name (e.g., "Circle", "Wave")
- Optionally add a command (e.g., "open browser")
- Click "Save Template"
-
Recognize Gestures
- Record a new gesture
- The app automatically compares it to your saved templates
- See recognition results with confidence scores
- Execute associated commands
- Hand Detection: Adjust sensitivity for hand detection
- Gesture Recognition: Set threshold for template matching
- Real-time updates without restart required
- View all saved gestures in the Library panel
- Delete unwanted templates
- Templates persist between sessions
- Export/import capabilities (planned)
- Real-time FPS display
- Connection status indicators
- Frame processing statistics
- Detection confidence tracking
- Use good lighting conditions
- Keep hand clearly visible to camera
- Draw gestures at moderate speed
- Make distinct, repeatable movements
- Avoid background clutter
- Create templates with consistent movements
- Use unique gesture shapes
- Avoid overly similar gestures
- Adjust confidence thresholds if needed
- Record multiple templates for variations
# In main.py
DETECTION_FPS = 25 # Inference frequency
INFERENCE_SIZE = 240 # Resolution for ML processing
JPEG_QUALITY = 0.75 # Stream compression
SKIP_FRAMES = 2 # Process every 3rd frame
# Default confidence thresholds
HAND_DETECTION_CONFIDENCE = 0.5 # Hand detection sensitivity
GESTURE_RECOGNITION_CONFIDENCE = 0.6 # Template matching threshold
AUTO_START_DELAY = 1.0 # Seconds before auto-start
- Light/Dark mode toggle in top bar
- Automatic system theme detection
- Persistent theme preferences
// Adjustable in UI
trajectoryColor: string; // Gesture trail color
boundingBoxStyle: object; // Hand detection visualization
confidenceDisplay: boolean; // Show confidence scores
Purpose: Real-time camera frame processing and hand detection
Client → Server Messages:
{
"type": "frame",
"frame": "..."
}
{
"type": "update_confidence",
"hand_detection_confidence": 0.5,
"gesture_recognition_confidence": 0.6
}
Server → Client Messages:
{
"type": "detection",
"detection": {
"x": 320.5,
"y": 240.3,
"width": 80.2,
"height": 75.8,
"confidence": 0.87,
"class": "hand"
},
"timestamp": 1699123456.789
}
{
"type": "connection_established",
"message": "Camera stream ready",
"current_hand_confidence": 0.5,
"current_gesture_confidence": 0.6
}
Purpose: Template management and gesture recognition
Client → Server Messages:
{
"type": "recognize_gesture",
"trajectory": [
{"x": 0.1, "y": 0.2},
{"x": 0.15, "y": 0.25},
...
],
"confidence_threshold": 0.6,
"templates": [
{
"name": "Circle",
"command": "open browser",
"trajectory": [...]
}
]
}
{
"type": "start_tracking",
"message": "Begin hand tracking"
}
{
"type": "stop_tracking",
"message": "Stop hand tracking"
}
Server → Client Messages:
{
"type": "gesture_recognized",
"template_name": "Circle",
"similarity": 0.85,
"command": "open browser"
}
{
"type": "gesture_not_recognized",
"message": "No matching gesture found"
}
Purpose: Health check and system status
Response:
{
"status": "healthy",
"camera_active": true,
"tracking_enabled": true,
"frame_count": 1234,
"detection_count": 567,
"inference_available": true,
"model_loaded": true
}
Purpose: MJPEG video stream (legacy, not used in current version)
Response: Multipart MJPEG stream
aircut/
├── desktop/ # Tauri desktop application
│ ├── src/ # React TypeScript source
│ │ ├── components/ # React components
│ │ │ ├── VideoFeed.tsx # Camera and visualization
│ │ │ └── ui/ # UI component library
│ │ ├── services/ # Business logic
│ │ │ └── frameStreamService.ts
│ │ ├── stores/ # State management
│ │ │ └── appStore.ts # Zustand store
│ │ ├── hooks/ # Custom React hooks
│ │ └── App.tsx # Main application
│ ├── src-tauri/ # Rust backend for Tauri
│ │ ├── src/main.rs # Tauri main process
│ │ └── Cargo.toml # Rust dependencies
│ ├── package.json # Node.js dependencies
│ └── tauri.conf.json # Tauri configuration
├── backend/ # Python FastAPI server
│ ├── main.py # Main server application
│ ├── requirements.txt # Python dependencies
│ └── .env # Environment variables
├── README.md # This documentation
└── .gitignore # Git ignore rules
# Start development server
npm run tauri dev
# Build for production
npm run tauri build
# Run tests
npm run test
# Lint code
npm run lint
# Format code
npm run format
# Start development server with auto-reload
python3.11 main.py
# Run with uvicorn directly
uvicorn main:app --host 127.0.0.1 --port 8000 --reload
- Record gesture using the UI
- Templates are stored in browser localStorage
- Access via
useAppStore().templates
- Modify
SimpleGestureRecognizer
class inmain.py
- Implement new similarity calculation methods
- Add configuration options for new parameters
- Create new component in
desktop/src/components/
- Follow existing component patterns
- Use Tailwind CSS for styling
- Integrate with Zustand store for state
- Use functional components with hooks
- Implement proper TypeScript types
- Follow React best practices
- Use Tailwind for styling
- Follow PEP 8 style guide
- Use type hints where appropriate
- Implement proper error handling
- Write docstrings for functions
Symptoms: Black screen, "Camera not available" message Solutions:
- Check camera permissions in system settings
- Close other applications using the camera
- Try different camera index (if multiple cameras)
- Restart the application
Symptoms: Red connection indicator, WebSocket errors Solutions:
- Verify backend server is running on port 8000
- Check that both WebSocket endpoints (/ws/frames and /ws/gestures) are available
- Check firewall settings
- Ensure no other service is using port 8000
- Restart backend server
Symptoms: No bounding boxes around hands Solutions:
- Verify Roboflow API key is correct
- Check that the Camera Stream WebSocket (/ws/frames) is connected
- Check internet connection for API calls
- Improve lighting conditions
- Lower hand detection confidence threshold
- Ensure hand is clearly visible to camera
Symptoms: Gestures not recognized or wrong matches Solutions:
- Verify that the Gesture WebSocket (/ws/gestures) is connected
- Record more distinct gesture templates
- Adjust gesture recognition confidence
- Ensure consistent gesture recording
- Avoid overly similar gesture shapes
- Re-record templates with better technique
Symptoms: Low FPS, lag, high CPU usage Solutions:
- Close unnecessary applications
- Reduce video resolution in camera settings
- Increase frame skip rate in configuration
- Use GPU-accelerated inference if available
- Check system resource usage
# Backend logging configuration
logging.basicConfig(level=logging.INFO) # Change to DEBUG for verbose output
- Frame processing statistics in
/health
endpoint - Real-time FPS display in UI
- WebSocket connection status indicators
- Detection confidence tracking
- Open browser developer tools (F12)
- Check Console tab for JavaScript errors
- Monitor Network tab for WebSocket connections
- Use Performance tab for profiling
- GitHub Issues: Report bugs and request features
- Discussions: Ask questions and share tips
- Wiki: Extended documentation and tutorials
- Code review guidelines in CONTRIBUTING.md
- Development setup instructions
- API documentation and examples
# Build Tauri application
npm run tauri build
# Output locations:
# - Windows: target/release/bundle/msi/
# - macOS: target/release/bundle/dmg/
# - Linux: target/release/bundle/deb/ or target/release/bundle/rpm/
# Install production dependencies
pip install -r requirements.txt
# Run with production server
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 1
FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
- Tauri automatically generates platform-specific installers
- Code signing certificates recommended for distribution
- App store submission guidelines available
Include with distribution:
- Minimum system requirements
- Installation instructions
- Camera permission setup
- Roboflow API setup guide
We welcome contributions to AirCut! Here's how to get started:
- Fork the repository
- Clone your fork locally
- Follow the installation instructions
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation as needed
- Use clear commit messages
- Ensure all tests pass
- New gesture recognition algorithms
- Additional UI features and improvements
- Performance optimizations
- Platform-specific enhancements
- Documentation improvements
- Bug fixes and testing
This project is licensed under the MIT License - see the LICENSE file for details.
- Roboflow - Computer vision platform and API
- Tauri Team - Desktop application framework
- OpenCV - Computer vision library
- FastAPI - High-performance web framework
- React Team - UI library and ecosystem
- Project Repository: GitHub
- Issues & Bug Reports: GitHub Issues
- Feature Requests: GitHub Discussions
Built with ❤️ for intuitive human-computer interaction