ScriptsMar 29, 2026·2 min read

Whisper — OpenAI Speech-to-Text

OpenAI's open-source speech recognition model. Transcribe audio/video to text with word-level timestamps in 99 languages. Essential for subtitle generation.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install openai-whisper
whisper audio.mp3 --model medium --language en --output_format srt

Intro

OpenAI's Whisper is an open-source speech recognition model trained on 680,000 hours of multilingual data. It transcribes audio to text with word-level timestamps in 99 languages, generates SRT/VTT subtitles, and handles accents, background noise, and technical jargon. 75,000+ GitHub stars. The foundation for most AI subtitle generation pipelines.

Best for: Subtitle generation, podcast transcription, video content indexing, multilingual transcription Works with: Python 3.8+, FFmpeg Setup time: 3 minutes (+ model download)


Models

Model Parameters Speed Accuracy VRAM
tiny 39M ~10x realtime Good ~1GB
base 74M ~7x realtime Better ~1GB
small 244M ~4x realtime Good+ ~2GB
medium 769M ~2x realtime Great ~5GB
large-v3 1.5B ~1x realtime Best ~10GB

Python API

import whisper

model = whisper.load_model("medium")
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

Output Formats

whisper audio.mp3 --output_format srt   # SubRip subtitles
whisper audio.mp3 --output_format vtt   # WebVTT subtitles
whisper audio.mp3 --output_format json  # Detailed JSON with word timestamps
whisper audio.mp3 --output_format txt   # Plain text

FAQ

Q: What is Whisper? A: OpenAI's open-source speech recognition model that transcribes audio to text in 99 languages with word-level timestamps. 75,000+ GitHub stars.

Q: Is Whisper free? A: Yes. Whisper is MIT-licensed and runs locally on your machine. No API costs.

Q: What languages does Whisper support? A: 99 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more.


🙏

Source & Thanks

Created by OpenAI. Licensed under MIT. whisper — ⭐ 75,000+

Related Assets