Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMar 29, 2026·2 min de lectura

Whisper — OpenAI Speech-to-Text

OpenAI's open-source speech recognition model. Transcribe audio/video to text with word-level timestamps in 99 languages. Essential for subtitle generation.

Introducción

OpenAI's Whisper is an open-source speech recognition model trained on 680,000 hours of multilingual data. It transcribes audio to text with word-level timestamps in 99 languages, generates SRT/VTT subtitles, and handles accents, background noise, and technical jargon. 75,000+ GitHub stars. The foundation for most AI subtitle generation pipelines.

Best for: Subtitle generation, podcast transcription, video content indexing, multilingual transcription Works with: Python 3.8+, FFmpeg Setup time: 3 minutes (+ model download)


Models

Model Parameters Speed Accuracy VRAM
tiny 39M ~10x realtime Good ~1GB
base 74M ~7x realtime Better ~1GB
small 244M ~4x realtime Good+ ~2GB
medium 769M ~2x realtime Great ~5GB
large-v3 1.5B ~1x realtime Best ~10GB

Python API

import whisper

model = whisper.load_model("medium")
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    print(f"[{segment['start']:.1f}s - {segment['end']:.1f}s] {segment['text']}")

Output Formats

whisper audio.mp3 --output_format srt   # SubRip subtitles
whisper audio.mp3 --output_format vtt   # WebVTT subtitles
whisper audio.mp3 --output_format json  # Detailed JSON with word timestamps
whisper audio.mp3 --output_format txt   # Plain text

FAQ

Q: What is Whisper? A: OpenAI's open-source speech recognition model that transcribes audio to text in 99 languages with word-level timestamps. 75,000+ GitHub stars.

Q: Is Whisper free? A: Yes. Whisper is MIT-licensed and runs locally on your machine. No API costs.

Q: What languages does Whisper support? A: 99 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Arabic, and more.


🙏

Fuente y agradecimientos

Created by OpenAI. Licensed under MIT. whisper — ⭐ 75,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados