whisper.cpp — Local Speech-to-Text in Pure C/C++
High-performance port of OpenAI Whisper in C/C++. No Python, no GPU required. Runs on CPU, Apple Silicon, CUDA, and even Raspberry Pi. Real-time transcription.
Agent 可直接安装
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install e1fd7c46-bbda-4956-8649-9c3ed579ff25 --target codex先 dry-run 确认安装计划,再运行此命令。
The Python tax on Whisper
OpenAI's original Whisper release depends on PyTorch, which means a 2GB dependency tree, GPU VRAM pressure, and slow cold starts. whisper.cpp is a ground-up port in pure C/C++ with zero runtime dependencies — a single 5MB binary that transcribes audio faster than realtime on any modern laptop.
60-second install
# macOS
brew install whisper-cpp
whisper-cpp -m /path/to/model.bin -f audio.wav
# From source (any OS)
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release
# Download a model
bash models/download-ggml-model.sh base.en
# Transcribe
./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
Model sizes and speed tradeoffs (M2 Pro MacBook, 16GB)
| Model | Size | Speed (60s audio) | WER (English) |
|---|---|---|---|
| tiny | 39 MB | 1.2s | 14% |
| base | 74 MB | 2.8s | 9% |
| small | 244 MB | 8.5s | 6% |
| medium | 769 MB | 21s | 4.5% |
| large-v3 | 1.5 GB | 58s | 2.8% |
For most voice-memo and meeting transcription, base.en is the sweet spot. For multilingual or professional audio, use large-v3.
Platforms that work
- macOS (Apple Silicon: CoreML + Metal GPU acceleration)
- Linux (CUDA, AMD ROCm, Vulkan, pure CPU)
- Windows (MSVC + CUDA)
- Raspberry Pi 4/5
- iOS / Android (via JNI + Core ML)
- WebAssembly (yes, whisper.cpp runs in the browser)
Real-time microphone transcription
./build/bin/stream -m models/ggml-base.en.bin --step 500 --length 5000
Streaming mode updates every 500ms with a 5s rolling window — good enough for live captioning, meeting notes, or voice-controlled agents.
Integration with AI agents
Pattern: voice-to-text feed for Claude Code
# Record voice memo
ffmpeg -f avfoundation -i ":0" -t 30 memo.wav
# Transcribe with whisper.cpp
TEXT=$(whisper-cpp -m ~/models/ggml-base.en.bin -f memo.wav -otxt -nt)
# Pipe into Claude
claude "$TEXT"
Sub-1 second latency on M2+ machines. Entirely offline.
Quantization for low-RAM devices
whisper.cpp supports 4-bit quantization via ggml — shrinks large-v3 from 1.5GB to 380MB with ~1% WER increase. Essential for mobile and Pi deployments:
./build/bin/quantize models/ggml-large-v3.bin models/ggml-large-v3-q4.bin q4_0
Benchmarks vs alternatives (2026)
| Solution | Latency (60s audio, CPU) | Offline | Dependencies |
|---|---|---|---|
| whisper.cpp (base) | 2.8s | ✅ | None |
| openai-whisper (Python) | 38s | ✅ | PyTorch 2GB |
| faster-whisper (CTranslate2) | 6.1s | ✅ | CTranslate2 |
| Cloud APIs (Deepgram) | network-bound | ❌ | API key |
whisper.cpp wins on offline latency by 3-10× margins.
Known limitations
- Speaker diarization not built-in. Pair with
whisper-diarizationorpyannote.audiofor who-said-what. - Long audio memory cost —
large-v3needs ~3GB RAM for continuous transcription. Chunk with the-snsflag. - Punctuation in streaming mode is weaker than offline mode — use non-streaming for final transcripts.
常见问题
Yes. whisper.cpp is a pure C/C++ implementation with no Python dependency at runtime. The binary is approximately 5MB and has no external runtime requirements other than the model file. This makes it ideal for embedded, mobile, and serverless deployments.
On an M2 Pro MacBook, whisper.cpp transcribes 60 seconds of audio with the base.en model in approximately 2.8 seconds. The Python openai-whisper equivalent takes around 38 seconds on the same hardware — a 13x speedup, primarily from Metal GPU acceleration.
For most voice memos and meeting transcription in English, base.en offers the best speed-accuracy tradeoff at 74MB and approximately 9% word error rate. For professional audio or multilingual content, use large-v3 at 1.5GB with 2.8% WER. Tiny is only recommended for extremely low-resource devices.
Yes. whisper.cpp supports Raspberry Pi 4 and 5 with ARM NEON optimizations. With the tiny or base model plus 4-bit quantization, it achieves near-realtime transcription on Pi 5. The quantized large-v3-q4 model runs but transcribes slower than realtime.
Yes. The stream binary processes audio in rolling windows with configurable step and length parameters. A typical 500ms step with 5-second window gives you live captioning suitable for meeting notes or voice-controlled agents.
引用来源 (3)
- whisper.cpp benchmarks— whisper.cpp achieves 2.8s for 60s audio on M2 Pro with base.en model
- whisper.cpp GitHub— 37K+ GitHub stars
- OpenAI Whisper paper— Whisper model originally released by OpenAI in 2022
来源与感谢
- GitHub: ggerganov/whisper.cpp — 37,000+ stars, MIT License
- By Georgi Gerganov (also creator of llama.cpp)
讨论
相关资产
llama.cpp — Run LLMs Locally in Pure C/C++
llama.cpp is a C/C++ LLM inference engine with 100K+ GitHub stars. Runs on CPU, Apple Silicon, NVIDIA, AMD GPUs. 1.5-8 bit quantization, no dependencies, supports 50+ model architectures. MIT licensed
Faster Whisper — 4x Faster Speech-to-Text
Faster Whisper is a reimplementation of OpenAI Whisper using CTranslate2, up to 4x faster with less memory. 21.8K+ GitHub stars. GPU/CPU, 8-bit quantization, word timestamps, VAD. MIT licensed.
Whisper — OpenAI Speech-to-Text
OpenAI's open-source speech recognition model. Transcribe audio/video to text with word-level timestamps in 99 languages. Essential for subtitle generation.
Groq Whisper — Sub-Second Speech-to-Text for Voice Agents
Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.