Is WhisperX — 70x Faster Speech Recognition free to use?

Yes. WhisperX — 70x Faster Speech Recognition is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install WhisperX — 70x Faster Speech Recognition?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 1, 2026·2 min read

WhisperX — 70x Faster Speech Recognition

Name: WhisperX — 70x Faster Speech Recognition
Author: TokRepo精选

WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

TokRepo精选 · Community

Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install whisperx

# Transcribe with word timestamps + speaker labels
whisperx audio.mp3 --model large-v2 --diarize --language en

# Or in Python
python -c "
import whisperx
model = whisperx.load_model('large-v2', device='cuda')
audio = whisperx.load_audio('audio.mp3')
result = model.transcribe(audio, batch_size=16)
# Align for word-level timestamps
model_a, metadata = whisperx.load_align_model(language_code='en', device='cuda')
result = whisperx.align(result['segments'], model_a, metadata, audio, device='cuda')
print(result['segments'])
"

Intro

WhisperX is an automatic speech recognition tool that provides 70x realtime transcription using batched inference on OpenAI's Whisper large-v2 model, with accurate word-level timestamps via wav2vec2 alignment and speaker diarization. With 21,000+ GitHub stars and BSD-2-Clause license, it requires under 8GB GPU memory for large models, includes voice activity detection preprocessing for noise reduction, and outputs precise per-word timing and speaker labels.

Best for: Developers building transcription, subtitling, or meeting analysis with speaker identification Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Performance: 70x realtime with large-v2, under 8GB VRAM

Key Features

70x realtime: Batched inference for dramatically faster transcription
Word-level timestamps: Precise timing via wav2vec2 alignment
Speaker diarization: Identify who said what using pyannote-audio
VAD preprocessing: Voice activity detection filters silence/noise
Under 8GB VRAM: Runs large models on consumer GPUs
CLI + Python API: Command-line tool and programmatic access

FAQ

Q: What is WhisperX? A: WhisperX is a speech recognition tool with 21K+ stars providing 70x realtime transcription, word timestamps, and speaker diarization. Under 8GB VRAM. BSD-2-Clause.

Q: How do I install WhisperX? A: pip install whisperx. Run whisperx audio.mp3 --model large-v2 --diarize for full pipeline.

🙏

Source & Thanks

Created by Max Bain. Licensed under BSD-2-Clause. m-bain/whisperX — 21,000+ GitHub stars

◈Home 🏆Trending 👤Me

WhisperX — 70x Faster Speech Recognition

Use it first, then decide how deep to go

Key Features

FAQ

Source & Thanks

Related Assets

Windmill — Open-Source Internal Tool Platform

Agno — Production AI Agent Runtime

Semantic Kernel — Microsoft AI Agent Framework