Is WhisperX — 70x Faster Speech Recognition free to use?

Yes. WhisperX — 70x Faster Speech Recognition is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install WhisperX — 70x Faster Speech Recognition?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsMar 31, 2026·2 min read

WhisperX — 70x Faster Speech Recognition

WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

Script Depot · Community

TL;DR

WhisperX runs Whisper at 70x realtime speed with word-level timestamps and speaker diarization.

§01

What it is

WhisperX is a speech recognition system that accelerates OpenAI's Whisper model to 70x realtime speed through batched inference. It adds word-level timestamps via forced alignment and speaker diarization to identify who said what. The project requires under 8GB of VRAM and is licensed under BSD-2-Clause.

Researchers, podcast producers, and developers building transcription pipelines will find WhisperX useful when standard Whisper is too slow or when per-word timing and speaker labels are required.

§02

How it saves time or tokens

Standard Whisper processes audio sequentially, making long recordings slow to transcribe. WhisperX batches audio segments and processes them in parallel, cutting transcription time drastically. The word-level alignment and diarization run as post-processing steps, so you get richer output without re-running the model.

§03

How to use

Install WhisperX with pip and ensure you have a CUDA-capable GPU with at least 8GB VRAM.
Run the CLI command with your audio file and desired output format.
Optionally enable speaker diarization by providing a HuggingFace token for the pyannote models.

§04

Example

# Basic transcription with word-level timestamps
whisperx audio.mp3 --model large-v2 --output_dir ./output

# With speaker diarization
whisperx audio.mp3 --model large-v2 --diarize \
  --hf_token YOUR_HF_TOKEN --output_dir ./output

# Specify language and output format
whisperx audio.mp3 --model large-v2 --language en \
  --output_format srt --output_dir ./output

§05

Related on TokRepo

AI tools for voice — Other voice and speech processing tools for AI workflows.
AI tools for automation — Automate transcription pipelines and audio processing tasks.

§06

Common pitfalls

Running without a CUDA GPU. WhisperX's speed advantage comes from GPU batching; CPU-only mode is significantly slower.
Forgetting the HuggingFace token for diarization. The pyannote speaker diarization models require authentication through HuggingFace.
Using the wrong model size for your VRAM. The large-v2 model needs close to 8GB; smaller GPUs should use the medium or small variants.

Frequently Asked Questions

How much faster is WhisperX compared to standard Whisper?+

WhisperX achieves 70x realtime speed through batched inference. A one-hour audio file that takes an hour with standard Whisper can be transcribed in under a minute with WhisperX on a capable GPU.

What is word-level timestamp alignment?+

After transcription, WhisperX uses forced alignment (via wav2vec2) to map each word to its exact start and end time in the audio. This is more precise than Whisper's default segment-level timestamps.

Does WhisperX support multiple languages?+

Yes. WhisperX inherits Whisper's multilingual support. You can specify the language with the --language flag or let it auto-detect. Forced alignment models are available for many languages.

What GPU do I need to run WhisperX?+

A CUDA-capable GPU with at least 8GB VRAM is recommended for the large-v2 model. Smaller models (medium, small, base) work on GPUs with less VRAM. CPU-only mode works but loses the speed advantage.

Can I use WhisperX for live streaming audio?+

WhisperX is designed for batch processing of recorded audio files. It is not optimized for real-time streaming. For live transcription, consider streaming-focused alternatives.

Citations (3)

WhisperX GitHub— WhisperX achieves 70x realtime with batched inference
OpenAI Whisper GitHub— OpenAI Whisper model for speech recognition
HuggingFace pyannote— Pyannote speaker diarization models

Related on TokRepo

AI voice tools Automation tools Featured workflows

🙏

Source & Thanks

Created by Max Bain. Licensed under BSD-2-Clause. m-bain/whisperX — 21,000+ GitHub stars

Discussion

No comments yet. Be the first to share your thoughts.

WhisperX — 70x Faster Speech Recognition

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

NAPI-RS — Build Node.js Native Addons in Rust

Mamba — Fast Cross-Platform Package Manager

Plasmo — The Browser Extension Framework