Model Sizes
| Model | Disk | RAM | Speed (CPU) | Quality |
|---|---|---|---|---|
| tiny | 75 MB | ~390 MB | ~32x real-time | Good for drafts |
| base | 142 MB | ~500 MB | ~16x real-time | Solid accuracy |
| small | 466 MB | ~1 GB | ~6x real-time | Very good |
| medium | 1.5 GB | ~2.6 GB | ~2x real-time | Excellent |
| large-v3 | 3.1 GB | ~4.8 GB | ~1x real-time | Best quality |
Hardware Acceleration
| Platform | Backend | Speedup |
|---|---|---|
| Apple Silicon | Core ML + Metal | 4-8x |
| NVIDIA GPU | CUDA | 5-10x |
| Intel CPU | OpenVINO | 2-3x |
| Any CPU | AVX2/NEON | Baseline |
| Raspberry Pi 4 | ARM NEON | Usable (tiny model) |
Language Bindings
whisper.cpp has community bindings for virtually every language:
- Python:
pywhispercpp—pip install pywhispercpp - Node.js:
whisper-node—npm install whisper-node - Go:
go-whisper - Rust:
whisper-rs - C#/.NET:
Whisper.net - Java:
whisper-jni - Ruby:
ruby-whisper
Features
- Streaming — real-time transcription from microphone
- Timestamps — word-level and segment-level timing
- Translation — transcribe and translate to English simultaneously
- VAD — voice activity detection to skip silence
- Speaker diarization — basic speaker identification
- Output formats — TXT, SRT, VTT, JSON, CSV