A React Native Expo wrapper for WhisperKit - Apple's on-device speech recognition framework.
- 🎙️ File-based transcription - Transcribe audio files with high accuracy
- 🌍 Multi-language support - Detect and transcribe in 30+ languages
- 📦 Multiple models - Choose from tiny to large models based on your needs
- ⚡ Hardware acceleration - Leverages Apple's Neural Engine for fast processing
- 🔄 Automatic model downloads - Models download automatically when first used
npm install whisper-kit-expo
After installation, you need to:
- Run
npx expo prebuild
to generate native iOS files - Navigate to the
ios
directory and runpod install
- Add microphone permissions to your
Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to microphone for speech recognition</string>
import { transcribe, loadTranscriber } from 'whisper-kit-expo';
// Initialize the transcriber (downloads model on first run)
await loadTranscriber();
// Transcribe an audio file
const text = await transcribe('/path/to/audio.m4a');
console.log(text);
import { TranscriberInitializer } from 'whisper-kit-expo';
function App() {
return (
<TranscriberInitializer>
<YourApp />
</TranscriberInitializer>
);
}
WhisperKit automatically downloads models from Hugging Face when you first use them. Models are cached locally for subsequent use.
import { loadTranscriber } from 'whisper-kit-expo';
// Load a specific model (downloads automatically if not present)
await loadTranscriber({
model: 'openai_whisper-base', // Model variant name
prewarm: true
});
English-only models (smaller, faster for English):
tiny.en
(39MB) - Fastest English modelbase.en
(74MB) - Good balance for Englishsmall.en
(244MB) - More accurate Englishmedium.en
(769MB) - High accuracy English
Multilingual models (support 99 languages):
tiny
(39MB) - Fastest multilingualbase
(74MB) - Good balance (recommended to start)small
(244MB) - More accuratemedium
(769MB) - High accuracylarge-v2
(1.5GB) - Previous best modellarge-v3
(1.5GB) - Latest, most accuratelarge-v3-turbo
(954MB) - Optimized large-v3
Distilled models (optimized for speed):
distil-large-v3
(756MB) - 2x faster than large-v3
Use these exact names when loading models:
openai_whisper-tiny.en
openai_whisper-tiny
openai_whisper-base.en
openai_whisper-base
openai_whisper-small.en
openai_whisper-small
openai_whisper-medium.en
openai_whisper-medium
openai_whisper-large-v2
openai_whisper-large-v3
openai_whisper-large-v3_turbo
distil-whisper_distil-large-v3
import { transcribeWithOptions } from 'whisper-kit-expo';
const result = await transcribeWithOptions('/path/to/audio.m4a', {
language: 'es', // Force Spanish, or leave blank for auto-detect
wordTimestamps: true, // Get word-level timing
task: 'transcribe', // or 'translate' to English
// Real-time progress updates
progressCallback: (progress) => {
console.log('Current text:', progress.text);
console.log('Tokens:', progress.tokens.length);
console.log('Avg log probability:', progress.avgLogprob);
console.log('Compression ratio:', progress.compressionRatio);
}
});
console.log(result.text); // Full transcription
console.log(result.language); // Detected language
console.log(result.segments); // Detailed segments with timestamps
import { detectLanguage, getSupportedLanguages } from 'whisper-kit-expo';
// Detect language from audio
const detection = await detectLanguage('/path/to/audio.m4a');
console.log(`Detected: ${detection.detectedLanguage}`);
console.log(`Probabilities:`, detection.languageProbabilities);
// Get all supported languages
const languages = getSupportedLanguages();
// { "en": "English", "es": "Spanish", "fr": "French", ... }
import { getAvailableModels, downloadModel, deleteModel } from 'whisper-kit-expo';
// List available models
const models = await getAvailableModels();
models.forEach(model => {
console.log(`${model.name}: ${model.description}`);
console.log(`Downloaded: ${model.isDownloaded}`);
});
// Pre-download a model
await downloadModel('large-v3');
// Delete a model to free up space
await deleteModel('large-v3');
Initialize the transcriber with optional model configuration.
Simple transcription function that returns the transcribed text.
Advanced transcription with options for language, timestamps, and more.
Detect the language of an audio file.
Get all supported languages as ISO codes with names.
Get list of available Whisper models.
Pre-download a specific model.
Delete a downloaded model.
Check if the transcriber is initialized and ready.
type ModelOptions = {
model?: string; // Model variant name
downloadBase?: string;
modelFolder?: string;
prewarm?: boolean;
};
type TranscriptionOptions = {
task?: 'transcribe' | 'translate';
language?: string; // ISO 639-1 code
temperature?: number;
wordTimestamps?: boolean;
// ... more options
};
type TranscriptionResult = {
text: string;
segments: TranscriptionSegment[];
language?: string;
};
- First Use: When you call
loadTranscriber()
with a model, WhisperKit checks if it exists locally - Auto-Download: If not present, it downloads from Hugging Face
- Local Cache: Models are stored in
~/Documents/huggingface/models/
- Reuse: Subsequent uses load from cache instantly
-
Model Selection:
- Start with
base
for good balance - Use
.en
variants if you only need English distil-large-v3
offers large model quality at 2x speed
- Start with
-
Memory Usage: Larger models require more memory
-
First Run: Initial model download may take time depending on size
- WAV
- MP3
- M4A
- FLAC
- AAC (in M4A container)
- iOS 17.0+
- Expo SDK 53+
- React Native 0.76.6+
If models fail to download, check your internet connection and available storage space.
For large models on older devices, you may need to use smaller models or close other apps.
Ensure your audio files are in a supported format and accessible at the provided path.
MIT
Built on top of WhisperKit by Argmax Inc.