ScriptsMar 31, 2026·2 min read

Faster Whisper — 4x Faster Speech-to-Text

Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2, up to 4x faster with less memory. 21.8K+ GitHub stars. GPU/CPU, 8-bit quantization, word timestamps, VAD. MIT licensed.

TL;DR
Faster Whisper runs OpenAI Whisper models up to 4x faster using CTranslate2 with reduced memory usage.
§01

What it is

Faster Whisper is a reimplementation of OpenAI's Whisper automatic speech recognition model using CTranslate2, a fast inference engine for Transformer models. It achieves up to 4x faster transcription compared to the original Whisper implementation while using less memory. The library supports GPU and CPU inference, 8-bit quantization, and all Whisper model sizes from tiny to large-v3.

Faster Whisper targets developers building transcription pipelines, voice-enabled applications, and audio processing workflows who need Whisper-quality results with better performance.

§02

How it saves time or tokens

The original Whisper implementation uses PyTorch and processes audio at roughly real-time speed on CPU. Faster Whisper's CTranslate2 backend optimizes the computation graph, uses INT8 quantization, and batches operations more efficiently. The result is transcription that completes in a fraction of the audio duration, even on CPU.

For batch processing of audio files, the speed improvement means processing a day's worth of recordings in hours instead of a full day. GPU acceleration with float16 further reduces processing time.

§03

How to use

  1. Install Faster Whisper:
pip install faster-whisper
  1. Transcribe an audio file:
from faster_whisper import WhisperModel

model = WhisperModel('large-v3', device='cuda', compute_type='float16')
segments, info = model.transcribe('audio.mp3')

for segment in segments:
    print(f'[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}')
  1. For CPU with INT8 quantization:
model = WhisperModel('large-v3', device='cpu', compute_type='int8')
§04

Example

from faster_whisper import WhisperModel
import os

model = WhisperModel('medium', device='cpu', compute_type='int8')

# Transcribe with word-level timestamps
segments, info = model.transcribe(
    'meeting.wav',
    beam_size=5,
    word_timestamps=True,
    language='en'
)

print(f'Detected language: {info.language} (prob: {info.language_probability:.2f})')

for segment in segments:
    print(f'[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}')
    for word in segment.words:
        print(f'  {word.word} ({word.start:.2f}s - {word.end:.2f}s)')

This transcribes audio with word-level timestamps, useful for subtitle generation, keyword search indexing, and speaker diarization preprocessing.

§05

Related on TokRepo

§06

Common pitfalls

  • The large-v3 model requires approximately 3GB of VRAM with float16. If your GPU runs out of memory, use a smaller model (medium, small) or switch to INT8 quantization which halves memory usage.
  • Audio files must be decoded before transcription. Faster Whisper handles common formats (MP3, WAV, FLAC) via ffmpeg. Make sure ffmpeg is installed on the system.
  • Word-level timestamps add processing overhead. Disable word_timestamps when you only need segment-level timing to improve throughput.

Frequently Asked Questions

How much faster is Faster Whisper compared to original Whisper?+

Faster Whisper achieves up to 4x faster transcription on GPU and significant improvements on CPU. The exact speedup depends on the model size, compute type, and hardware. INT8 quantization on CPU provides the largest improvement over the original PyTorch implementation.

Does Faster Whisper produce the same output as original Whisper?+

Faster Whisper uses the same model weights and produces equivalent transcription quality. Minor differences in floating-point arithmetic between CTranslate2 and PyTorch may cause negligible variations in word boundaries, but the text output is functionally identical.

Can I use Faster Whisper for real-time transcription?+

Faster Whisper processes audio faster than real-time on GPU, making it suitable for near-real-time applications. However, it processes complete audio segments, not streaming audio. For true streaming, you need to feed audio in chunks and manage segment boundaries yourself.

What languages does Faster Whisper support?+

Faster Whisper supports all 99 languages that OpenAI Whisper supports. Language detection is automatic, or you can specify the language parameter to skip detection and improve speed. Quality varies by language and model size.

Can I fine-tune Faster Whisper models?+

No. Faster Whisper is an inference-only library. Fine-tuning must be done with the original Whisper implementation or compatible training frameworks. After fine-tuning, you can convert the model to CTranslate2 format for fast inference with Faster Whisper.

Citations (3)
🙏

Source & Thanks

Created by SYSTRAN. Licensed under MIT. SYSTRAN/faster-whisper — 21,800+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets