Faster Whisper — 4x Faster Speech-to-Text
Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2, up to 4x faster with less memory. 21.8K+ GitHub stars. GPU/CPU, 8-bit quantization, word timestamps, VAD. MIT licensed.
What it is
Faster Whisper is a reimplementation of OpenAI's Whisper automatic speech recognition model using CTranslate2, a fast inference engine for Transformer models. It achieves up to 4x faster transcription compared to the original Whisper implementation while using less memory. The library supports GPU and CPU inference, 8-bit quantization, and all Whisper model sizes from tiny to large-v3.
Faster Whisper targets developers building transcription pipelines, voice-enabled applications, and audio processing workflows who need Whisper-quality results with better performance.
How it saves time or tokens
The original Whisper implementation uses PyTorch and processes audio at roughly real-time speed on CPU. Faster Whisper's CTranslate2 backend optimizes the computation graph, uses INT8 quantization, and batches operations more efficiently. The result is transcription that completes in a fraction of the audio duration, even on CPU.
For batch processing of audio files, the speed improvement means processing a day's worth of recordings in hours instead of a full day. GPU acceleration with float16 further reduces processing time.
How to use
- Install Faster Whisper:
pip install faster-whisper
- Transcribe an audio file:
from faster_whisper import WhisperModel
model = WhisperModel('large-v3', device='cuda', compute_type='float16')
segments, info = model.transcribe('audio.mp3')
for segment in segments:
print(f'[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}')
- For CPU with INT8 quantization:
model = WhisperModel('large-v3', device='cpu', compute_type='int8')
Example
from faster_whisper import WhisperModel
import os
model = WhisperModel('medium', device='cpu', compute_type='int8')
# Transcribe with word-level timestamps
segments, info = model.transcribe(
'meeting.wav',
beam_size=5,
word_timestamps=True,
language='en'
)
print(f'Detected language: {info.language} (prob: {info.language_probability:.2f})')
for segment in segments:
print(f'[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}')
for word in segment.words:
print(f' {word.word} ({word.start:.2f}s - {word.end:.2f}s)')
This transcribes audio with word-level timestamps, useful for subtitle generation, keyword search indexing, and speaker diarization preprocessing.
Related on TokRepo
- AI voice tools -- Explore speech-to-text and voice synthesis tools
- Local LLM runners -- Run AI models privately on your hardware
Common pitfalls
- The large-v3 model requires approximately 3GB of VRAM with float16. If your GPU runs out of memory, use a smaller model (medium, small) or switch to INT8 quantization which halves memory usage.
- Audio files must be decoded before transcription. Faster Whisper handles common formats (MP3, WAV, FLAC) via ffmpeg. Make sure ffmpeg is installed on the system.
- Word-level timestamps add processing overhead. Disable
word_timestampswhen you only need segment-level timing to improve throughput.
Frequently Asked Questions
Faster Whisper achieves up to 4x faster transcription on GPU and significant improvements on CPU. The exact speedup depends on the model size, compute type, and hardware. INT8 quantization on CPU provides the largest improvement over the original PyTorch implementation.
Faster Whisper uses the same model weights and produces equivalent transcription quality. Minor differences in floating-point arithmetic between CTranslate2 and PyTorch may cause negligible variations in word boundaries, but the text output is functionally identical.
Faster Whisper processes audio faster than real-time on GPU, making it suitable for near-real-time applications. However, it processes complete audio segments, not streaming audio. For true streaming, you need to feed audio in chunks and manage segment boundaries yourself.
Faster Whisper supports all 99 languages that OpenAI Whisper supports. Language detection is automatic, or you can specify the language parameter to skip detection and improve speed. Quality varies by language and model size.
No. Faster Whisper is an inference-only library. Fine-tuning must be done with the original Whisper implementation or compatible training frameworks. After fine-tuning, you can convert the model to CTranslate2 format for fast inference with Faster Whisper.
Citations (3)
- Faster Whisper GitHub— Up to 4x faster transcription using CTranslate2
- OpenAI Whisper— Based on OpenAI Whisper model architecture
- CTranslate2— CTranslate2 inference engine optimizations
Related on TokRepo
Source & Thanks
Created by SYSTRAN. Licensed under MIT. SYSTRAN/faster-whisper — 21,800+ GitHub stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.