# WhisperX — 70x Faster Speech Recognition

> WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

## Install

Save as a script file and run:

## Quick Use

```bash
# Install
pip install whisperx

# Transcribe with word timestamps + speaker labels
whisperx audio.mp3 --model large-v2 --diarize --language en

# Or in Python
python -c "
import whisperx
model = whisperx.load_model('large-v2', device='cuda')
audio = whisperx.load_audio('audio.mp3')
result = model.transcribe(audio, batch_size=16)
# Align for word-level timestamps
model_a, metadata = whisperx.load_align_model(language_code='en', device='cuda')
result = whisperx.align(result['segments'], model_a, metadata, audio, device='cuda')
print(result['segments'])
"
```

---

## Intro

WhisperX is an automatic speech recognition tool that provides 70x realtime transcription using batched inference on OpenAI's Whisper large-v2 model, with accurate word-level timestamps via wav2vec2 alignment and speaker diarization. With 21,000+ GitHub stars and BSD-2-Clause license, it requires under 8GB GPU memory for large models, includes voice activity detection preprocessing for noise reduction, and outputs precise per-word timing and speaker labels.

**Best for**: Developers building transcription, subtitling, or meeting analysis with speaker identification
**Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf
**Performance**: 70x realtime with large-v2, under 8GB VRAM

---

## Key Features

- **70x realtime**: Batched inference for dramatically faster transcription
- **Word-level timestamps**: Precise timing via wav2vec2 alignment
- **Speaker diarization**: Identify who said what using pyannote-audio
- **VAD preprocessing**: Voice activity detection filters silence/noise
- **Under 8GB VRAM**: Runs large models on consumer GPUs
- **CLI + Python API**: Command-line tool and programmatic access

---

### FAQ

**Q: What is WhisperX?**
A: WhisperX is a speech recognition tool with 21K+ stars providing 70x realtime transcription, word timestamps, and speaker diarization. Under 8GB VRAM. BSD-2-Clause.

**Q: How do I install WhisperX?**
A: `pip install whisperx`. Run `whisperx audio.mp3 --model large-v2 --diarize` for full pipeline.

---

## Source & Thanks

> Created by [Max Bain](https://github.com/m-bain). Licensed under BSD-2-Clause.
> [m-bain/whisperX](https://github.com/m-bain/whisperX) — 21,000+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/c43ad870-8c99-471a-898e-b07140faf532
Author: Script Depot