Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMar 31, 2026·2 min de lectura

WhisperX — 70x Faster Speech Recognition

WhisperX provides 70x realtime speech recognition with word-level timestamps and speaker diarization. 21K+ GitHub stars. Batched inference, under 8GB VRAM. BSD-2-Clause.

Introducción

WhisperX is an automatic speech recognition tool that provides 70x realtime transcription using batched inference on OpenAI's Whisper large-v2 model, with accurate word-level timestamps via wav2vec2 alignment and speaker diarization. With 21,000+ GitHub stars and BSD-2-Clause license, it requires under 8GB GPU memory for large models, includes voice activity detection preprocessing for noise reduction, and outputs precise per-word timing and speaker labels.

Best for: Developers building transcription, subtitling, or meeting analysis with speaker identification Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Performance: 70x realtime with large-v2, under 8GB VRAM


Key Features

  • 70x realtime: Batched inference for dramatically faster transcription
  • Word-level timestamps: Precise timing via wav2vec2 alignment
  • Speaker diarization: Identify who said what using pyannote-audio
  • VAD preprocessing: Voice activity detection filters silence/noise
  • Under 8GB VRAM: Runs large models on consumer GPUs
  • CLI + Python API: Command-line tool and programmatic access

FAQ

Q: What is WhisperX? A: WhisperX is a speech recognition tool with 21K+ stars providing 70x realtime transcription, word timestamps, and speaker diarization. Under 8GB VRAM. BSD-2-Clause.

Q: How do I install WhisperX? A: pip install whisperx. Run whisperx audio.mp3 --model large-v2 --diarize for full pipeline.


🙏

Fuente y agradecimientos

Created by Max Bain. Licensed under BSD-2-Clause. m-bain/whisperX — 21,000+ GitHub stars

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados