Voxtral Test

My investigation of Voxtral Mini-3B's capabilities.

Installation / Basic operation

On my system --torch-backend=auto fails, so I have to manually set it, after querying nvidia-smi (see: astral-sh/uv#14647)

uv venv --python 3.12 --seed
source .venv/bin/activate

# There is a warning about flashinfer not being available, but I couldn't install it because vllm is on torch 2.7
# uv pip install flashinfer-python -i https://flashinfer.ai/whl/cu126/torch2.6

uv pip install -U "vllm[audio]" --torch-backend=cu126 --extra-index-url https://wheels.vllm.ai/nightly

uv pip install --upgrade mistral_common\[audio\] "numpy<2.3"

# I serve on custom port 8333
vllm serve mistralai/Voxtral-Mini-3B-2507 --port 8333 --tokenizer_mode mistral --config_format mistral --load_format mistral

# Test transcription
uv run transcribe.py

# or 
uv run streaming.py

VAD-based streaming

This script uses auditok for Voice Activity Detection (VAD) and streams audio chunks to the Voxtral model for transcription. Takes 8 seconds to transcribe the 180 seconds of Obama Audio on a RTX 3090 or 62 seconds for 31 minutes of the Count of Monte Cristo. Max concurrency is set to 32, but an RTX 3090 is pretty maxed out at ~16 parallel requests.

# All the installation before / start vllm / then also:
uv pip install auditok
uv run vad_streaming.py

Timestamps

Voxtral does not supports segment or word level timestamps their announcement says coming soon. A poor man's sliding window implementation is provided in timestamps.py.

uv run timestamps.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
count_of_monte_cristo_003_dumas.mp3		count_of_monte_cristo_003_dumas.mp3
streaming.py		streaming.py
timestamps.py		timestamps.py
transcribe.py		transcribe.py
vad_streaming.py		vad_streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voxtral Test

Installation / Basic operation

VAD-based streaming

Timestamps

About

Uh oh!

Releases

Packages

Languages

coezbek/voxtral-test

Folders and files

Latest commit

History

Repository files navigation

Voxtral Test

Installation / Basic operation

VAD-based streaming

Timestamps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages