Key Features
- 70x realtime: Batched inference for dramatically faster transcription
- Word-level timestamps: Precise timing via wav2vec2 alignment
- Speaker diarization: Identify who said what using pyannote-audio
- VAD preprocessing: Voice activity detection filters silence/noise
- Under 8GB VRAM: Runs large models on consumer GPUs
- CLI + Python API: Command-line tool and programmatic access
FAQ
Q: What is WhisperX? A: WhisperX is a speech recognition tool with 21K+ stars providing 70x realtime transcription, word timestamps, and speaker diarization. Under 8GB VRAM. BSD-2-Clause.
Q: How do I install WhisperX?
A: pip install whisperx. Run whisperx audio.mp3 --model large-v2 --diarize for full pipeline.