Scripts2026年3月29日·1 分钟阅读

Whisper — OpenAI Speech-to-Text

OpenAI's open-source speech recognition model. Transcribe audio/video to text with word-level timestamps in 99 languages. Essential for subtitle generation.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

pip install openai-whisper
whisper audio.mp3 --model medium --language en --output_format srt

介绍

OpenAI's Whisper is an open-source speech recognition model trained on 680,000 hours of multilingual data. It transcribes audio to text with word-level timestamps in 99 languages, generates SRT/VTT subtitles, and handles accents, background noise, and technical jargon. 75,000+ GitHub stars. The foundation for most AI subtitle generation pipelines.

Best for: Subtitle generation, podcast transcription, video content indexing, multilingual transcription Works with: Python 3.8+, FFmpeg Setup time: 3 minutes (+ model download)


Models

Model Parameters Speed Accuracy VRAM
tiny 39M ~10x realtime Good ~1GB
base 74M ~7x realtime Better ~1GB
small 244M ~4x realtime Good+ ~2GB
medium 769M ~2x realtime Great ~5GB
large-v3 1.5B ~1x realtime Best ~10GB

Python API

import whisper

model = whisper.load_model("medium")
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

Output Formats

whisper audio.mp3 --output_format srt   # SubRip subtitles
whisper audio.mp3 --output_format vtt   # WebVTT subtitles
whisper audio.mp3 --output_format json  # Detailed JSON with word timestamps
whisper audio.mp3 --output_format txt   # Plain text

FAQ

Q: What is Whisper? A: OpenAI's open-source speech recognition model that transcribes audio to text in 99 languages with word-level timestamps. 75,000+ GitHub stars.

Q: Is Whisper free? A: Yes. Whisper is MIT-licensed and runs locally on your machine. No API costs.

Q: What languages does Whisper support? A: 99 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more.


🙏

来源与感谢

Created by OpenAI. Licensed under MIT. whisper — ⭐ 75,000+

相关资产