What is Whisper — OpenAI Speech-to-Text?

OpenAI's open-source speech recognition model. Transcribe audio/video to text with word-level timestamps in 99 languages. Essential for subtitle generation.

Is Whisper — OpenAI Speech-to-Text free to use?

Yes. Whisper — OpenAI Speech-to-Text is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Whisper — OpenAI Speech-to-Text?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Whisper — OpenAI Speech-to-Text

Models

Model	Parameters	Speed	Accuracy	VRAM
tiny	39M	~10x realtime	Good	~1GB
base	74M	~7x realtime	Better	~1GB
small	244M	~4x realtime	Good+	~2GB
medium	769M	~2x realtime	Great	~5GB
large-v3	1.5B	~1x realtime	Best	~10GB

Python API

import whisper

model = whisper.load_model("medium")
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

Output Formats

whisper audio.mp3 --output_format srt   # SubRip subtitles
whisper audio.mp3 --output_format vtt   # WebVTT subtitles
whisper audio.mp3 --output_format json  # Detailed JSON with word timestamps
whisper audio.mp3 --output_format txt   # Plain text

FAQ

Q: What is Whisper? A: OpenAI's open-source speech recognition model that transcribes audio to text in 99 languages with word-level timestamps. 75,000+ GitHub stars.

Q: Is Whisper free? A: Yes. Whisper is MIT-licensed and runs locally on your machine. No API costs.

Q: What languages does Whisper support? A: 99 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more.

Whisper — OpenAI Speech-to-Text

先拿来用，再决定要不要深挖

Models

Python API

Output Formats

FAQ

来源与感谢

相关资产

MoneyPrinterTurbo — One-Click Short Video Generator

Diffusers — Universal Video & Image Generation Hub

VideoCaptioner — AI Subtitle Pipeline