Skills2026年3月29日·1 分钟阅读

VideoCaptioner — AI Subtitle Pipeline

LLM-powered video subtitle tool: Whisper transcription + AI correction + 99-language translation + styled subtitle export. 13,800+ stars.

Script Depot · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

VideoCaptioner — AI Subtitle Pipeline

直接安装命令

npx -y tokrepo@latest install d12d8441-f0da-4d3d-a0c2-0f258b27336f --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

VideoCaptioner combines Whisper transcription, LLM-based correction, 99-language translation, and styled subtitle export in a single desktop application.

§01

What it is

VideoCaptioner is an open-source desktop application that automates the video subtitle workflow. It chains Whisper-based speech recognition, LLM-powered text correction, translation into 99 languages, and styled subtitle export (SRT, ASS, VTT) into a single pipeline. The tool provides a GUI for configuring each stage.

VideoCaptioner targets content creators, video editors, and localization teams who need accurate subtitles across languages. It handles the full lifecycle from raw audio to production-ready subtitle files without manual transcription or separate translation tools.

§02

How it saves time or tokens

Manual subtitle creation involves transcribing audio, correcting recognition errors, translating to target languages, and formatting subtitle files. VideoCaptioner automates all four stages. The LLM correction step catches Whisper's common errors (proper nouns, technical terms, homophones) without human review. Batch processing handles multiple videos sequentially.

§03

How to use

Download the latest release from GitHub or clone the repo and install dependencies with pip install -r requirements.txt.
Run python main.py to open the GUI. Configure your Whisper model and LLM API settings.
Load a video file, select source and target languages, and start the pipeline.

§04

Example

# Clone and set up
git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner
pip install -r requirements.txt
python main.py

# Pipeline stages:
# 1. Whisper transcribes audio to text
# 2. LLM corrects transcription errors
# 3. Translation to selected languages
# 4. Export as SRT, ASS, or VTT with styling

Stage	Input	Output
Transcription	Video/audio file	Raw text with timestamps
Correction	Raw transcript	Cleaned transcript
Translation	Cleaned text	Multi-language text
Export	Translated text	SRT/ASS/VTT files

§05

Related on TokRepo

Video AI Tools — AI-powered video production tools
Content AI Tools — Content creation and processing tools

§06

Common pitfalls

Whisper accuracy depends heavily on audio quality. Background music, overlapping speakers, and low-quality microphones reduce transcription accuracy.
LLM correction requires API access (OpenAI or compatible). Without it, you get raw Whisper output which may contain errors for domain-specific vocabulary.
Translation quality varies by language pair. Common pairs (English-Spanish, English-Chinese) produce better results than less common language combinations.

常见问题

Which Whisper model should I use?+

The large-v3 model provides the best accuracy but requires more GPU memory and processing time. The medium model offers a good balance for most content. For fast processing with acceptable quality, use the small model.

What LLM providers does VideoCaptioner support?+

VideoCaptioner supports OpenAI API and compatible endpoints. You can configure any OpenAI-compatible API (including local models via Ollama or LM Studio) for the correction and translation stages.

Can VideoCaptioner process audio-only files?+

Yes. VideoCaptioner accepts both video and audio files. For audio-only inputs, it skips the video processing and goes directly to transcription.

What subtitle formats does it export?+

VideoCaptioner exports SRT (SubRip), ASS (Advanced SubStation Alpha with styling), and VTT (WebVTT for web video). ASS format supports custom fonts, colors, and positioning.

Does VideoCaptioner run on GPU?+

Yes, when a CUDA-compatible GPU is available. Whisper uses GPU acceleration for faster transcription. The tool falls back to CPU processing if no GPU is detected, but processing time increases significantly.

引用来源 (3)

VideoCaptioner GitHub— VideoCaptioner combines Whisper transcription, LLM correction, and multi-languag…
Whisper GitHub— OpenAI Whisper speech recognition model
Whisper Paper— Automatic speech recognition and translation research

🙏

来源与感谢

Created by WEIFENG2333. Licensed under GPL-3.0. VideoCaptioner — ⭐ 13,800+

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

VideoCaptioner — AI Subtitle Pipeline

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Data Juicer — Data Processing Pipeline for Foundation Models

Remotion Captions & Subtitles — AI-Powered Video Subtitles

Luigi — Python Pipeline Orchestration by Spotify

Pachyderm — Data Versioning and Pipeline Orchestration