Remotion Rule: Transcribe Captions
Remotion skill rule: Transcribing audio to generate captions in Remotion. Part of the official Remotion Agent Skill for programmatic video in React.
Agent 可直接安装
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install 957e8918-a9ce-4b3e-97e1-6493d6ed75db --target codex先 dry-run 确认安装计划,再运行此命令。
What it is
This is a Remotion skill rule for transcribing audio files to generate timed captions in programmatic video projects. It uses the @remotion/install-whisper-cpp package, which bundles Whisper.cpp for local speech-to-text transcription without sending audio to external APIs.
The rule is part of the official Remotion Agent Skill collection. It targets developers building React-based videos with Claude Code, Cursor, or OpenAI Codex who need automatic subtitles.
How it saves time or tokens
Manual caption creation is tedious and error-prone. This rule teaches AI coding assistants the exact sequence: install whisper-cpp, download a model, transcribe audio, and convert the output to Remotion's caption format. Without the rule, an assistant might suggest external transcription APIs or generate incorrect package imports. The rule ensures the assistant produces working transcription code on the first attempt.
How to use
- Install the Remotion skills collection:
npx skills add remotion-dev/skills
- Add the whisper-cpp package to your Remotion project:
npx remotion add @remotion/install-whisper-cpp
- Create a Node.js script that downloads the model, transcribes audio, and outputs caption data.
Example
import path from 'path';
import {
downloadWhisperModel,
installWhisperCpp,
transcribe,
toCaptions,
} from '@remotion/install-whisper-cpp';
const whisperPath = path.join(process.cwd(), 'whisper.cpp');
await installWhisperCpp({ to: whisperPath });
await downloadWhisperModel({
model: 'medium.en',
folder: whisperPath,
});
const result = await transcribe({
inputPath: 'src/audio/narration.wav',
whisperPath,
model: 'medium.en',
tokenLevelTimestamps: true,
});
const captions = toCaptions({ transcription: result });
console.log(JSON.stringify(captions, null, 2));
Related on TokRepo
- Featured Workflows — Top-rated skills and workflows for AI-assisted development
- AI Tools for Video — Tools for video generation and editing with AI
This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.
For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.
Common pitfalls
- The
medium.enmodel is English-only but faster; usemediumorlargefor multilingual transcription at the cost of slower processing. - Whisper.cpp compiles native code on first install, so your system needs a C++ toolchain (Xcode CLI tools on macOS, build-essential on Linux).
- Token-level timestamps (
tokenLevelTimestamps: true) are required for word-by-word caption rendering; without them you only get segment-level timing.
常见问题
The transcribe function accepts WAV files. If your audio is in MP3 or AAC format, convert it to WAV first using ffmpeg. Remotion's own audio extraction tools can also produce WAV output from video files.
Whisper.cpp supports GPU acceleration via Metal on macOS and CUDA on Linux. The installWhisperCpp function compiles with available GPU support automatically. GPU transcription is significantly faster for longer audio files.
Accuracy depends on the model size. The medium.en model provides good accuracy for clear English speech. The large model improves accuracy for accented speech and noisy audio but takes longer to process.
No. Whisper.cpp runs entirely locally on your machine. There are no API calls, no rate limits, and no costs beyond compute time. Your audio never leaves your system.
Yes. The toCaptions function returns a JSON array of timed caption objects. You can filter, merge, or modify entries programmatically before passing them to your Remotion composition for rendering.
引用来源 (3)
- Remotion Whisper Docs— Remotion install-whisper-cpp provides local speech-to-text transcription
- Whisper.cpp GitHub— Whisper.cpp is a C++ port of OpenAI Whisper for local inference
- Remotion Skills GitHub— Remotion Agent Skill is the official skill set for AI-assisted video creation
来源与感谢
Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule:
transcribe-captions
Part of the Remotion AI Skill collection on TokRepo.
讨论
相关资产
Remotion Rule: Import Srt Captions
Remotion skill rule: Importing .srt subtitle files into Remotion using @remotion/captions. Part of the official Remotion Agent Skill for programmatic video in React.
Remotion Rule: Get Audio Duration
Remotion skill rule: Getting the duration of an audio file in seconds with Mediabunny. Part of the official Remotion Agent Skill for programmatic video in React.
Remotion Rule: Text Animations
Remotion skill rule: Typography and text animation patterns for Remotion.. Part of the official Remotion Agent Skill for programmatic video in React.
Remotion Rule: Videos
Remotion skill rule: Embedding videos in Remotion - trimming, volume, speed, looping, pitch. Part of the official Remotion Agent Skill for programmatic video in React.