代码Apr 2, 2026·2 min read

whisper.cpp — Local Speech-to-Text in Pure C/C++

High-performance port of OpenAI Whisper in C/C++. No Python, no GPU required. Runs on CPU, Apple Silicon, CUDA, and even Raspberry Pi. Real-time transcription.

TL;DR
whisper.cpp runs OpenAI Whisper locally in C/C++ with no Python and no internet needed.
§01

What it is

whisper.cpp is a high-performance C/C++ port of OpenAI's Whisper speech recognition model by Georgi Gerganov (creator of llama.cpp). It runs entirely locally with zero dependencies: no Python, no PyTorch, no internet connection needed.

The key advantage: it runs efficiently on CPU. Apple Silicon gets 4-8x speedup via Core ML and Metal. NVIDIA GPUs work via CUDA. Even a Raspberry Pi can transcribe audio. Real-time streaming transcription works on modern laptops.

§02

How it saves time or tokens

whisper.cpp provides speech-to-text without cloud API costs or latency. Traditional Whisper requires Python, PyTorch, and ideally a GPU. whisper.cpp runs on any hardware with a single binary. For privacy-sensitive applications, all processing stays on-device. The tiny model (75 MB) transcribes at 32x real-time on CPU, making it practical for batch processing of audio archives.

§03

How to use

  1. Clone, build, and download a model:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release
bash models/download-ggml-model.sh base.en
  1. Transcribe an audio file:
./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
  1. Real-time microphone transcription:
./build/bin/whisper-stream -m models/ggml-base.en.bin
# Speak into your microphone -- text appears in real time
§04

Example

Model size comparison for different use cases:

| Model  | Disk    | RAM     | Speed (CPU)    | Quality          |
|--------|---------|---------|----------------|------------------|
| tiny   | 75 MB   | ~390 MB | ~32x real-time | Good for drafts  |
| base   | 142 MB  | ~500 MB | ~16x real-time | Solid accuracy   |
| small  | 466 MB  | ~1 GB   | ~6x real-time  | Good quality     |
| medium | 1.5 GB  | ~2.6 GB | ~2x real-time  | High quality     |
| large  | 2.9 GB  | ~4.7 GB | ~1x real-time  | Best quality     |
# Output formats
./build/bin/whisper-cli -m models/ggml-base.en.bin -f audio.wav -otxt   # Plain text
./build/bin/whisper-cli -m models/ggml-base.en.bin -f audio.wav -osrt   # SRT subtitles
./build/bin/whisper-cli -m models/ggml-base.en.bin -f audio.wav -ovtt   # VTT subtitles
./build/bin/whisper-cli -m models/ggml-base.en.bin -f audio.wav -ojson  # JSON with timestamps
§05

Related on TokRepo

§06

Common pitfalls

  • Using the large model on hardware without a GPU leads to very slow transcription. Start with base or small for CPU-only setups.
  • Audio files must be 16kHz 16-bit mono WAV. Convert other formats with ffmpeg before processing.
  • Real-time streaming requires a low-latency audio capture setup. Ensure your microphone input is configured correctly for the whisper-stream binary.

Frequently Asked Questions

Does whisper.cpp require a GPU?+

No. whisper.cpp runs on CPU by default. GPU acceleration via CUDA (NVIDIA), Metal (Apple), and Core ML (Apple) is optional and provides significant speedups. Even a Raspberry Pi can run the tiny model.

How does whisper.cpp compare to the Python Whisper?+

whisper.cpp provides the same transcription quality (it uses the same model weights) but runs without Python dependencies. It is faster on CPU and uses less memory. The tradeoff is that it requires manual compilation.

What audio formats does whisper.cpp support?+

whisper.cpp requires 16kHz 16-bit mono WAV input. Convert other formats using ffmpeg: ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav.

Can whisper.cpp do real-time transcription?+

Yes. The whisper-stream binary captures audio from your microphone and transcribes it in real time. This works with the tiny and base models on modern hardware.

What output formats are available?+

whisper.cpp outputs plain text, SRT subtitles, VTT subtitles, JSON with timestamps, and CSV. Choose the format with -otxt, -osrt, -ovtt, -ojson, or -ocsv flags.

Citations (3)
🙏

Source & Thanks

  • GitHub: ggerganov/whisper.cpp — 37,000+ stars, MIT License
  • By Georgi Gerganov (also creator of llama.cpp)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets