ScriptsApr 1, 2026·2 min read

WhisperX — 70x Faster Speech Recognition

WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install whisperx

# Transcribe with word timestamps + speaker labels
whisperx audio.mp3 --model large-v2 --diarize --language en

# Or in Python
python -c "
import whisperx
model = whisperx.load_model('large-v2', device='cuda')
audio = whisperx.load_audio('audio.mp3')
result = model.transcribe(audio, batch_size=16)
# Align for word-level timestamps
model_a, metadata = whisperx.load_align_model(language_code='en', device='cuda')
result = whisperx.align(result['segments'], model_a, metadata, audio, device='cuda')
print(result['segments'])
"

Intro

WhisperX is an automatic speech recognition tool that provides 70x realtime transcription using batched inference on OpenAI's Whisper large-v2 model, with accurate word-level timestamps via wav2vec2 alignment and speaker diarization. With 21,000+ GitHub stars and BSD-2-Clause license, it requires under 8GB GPU memory for large models, includes voice activity detection preprocessing for noise reduction, and outputs precise per-word timing and speaker labels.

Best for: Developers building transcription, subtitling, or meeting analysis with speaker identification Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Performance: 70x realtime with large-v2, under 8GB VRAM


Key Features

  • 70x realtime: Batched inference for dramatically faster transcription
  • Word-level timestamps: Precise timing via wav2vec2 alignment
  • Speaker diarization: Identify who said what using pyannote-audio
  • VAD preprocessing: Voice activity detection filters silence/noise
  • Under 8GB VRAM: Runs large models on consumer GPUs
  • CLI + Python API: Command-line tool and programmatic access

FAQ

Q: What is WhisperX? A: WhisperX is a speech recognition tool with 21K+ stars providing 70x realtime transcription, word timestamps, and speaker diarization. Under 8GB VRAM. BSD-2-Clause.

Q: How do I install WhisperX? A: pip install whisperx. Run whisperx audio.mp3 --model large-v2 --diarize for full pipeline.


🙏

Source & Thanks

Created by Max Bain. Licensed under BSD-2-Clause. m-bain/whisperX — 21,000+ GitHub stars

Related Assets