Remotion Rule: Transcribe Captions
Remotion skill rule: Transcribing audio to generate captions in Remotion. Part of the official Remotion Agent Skill for programmatic video in React.
What it is
This is a Remotion skill rule for transcribing audio files to generate timed captions in programmatic video projects. It uses the @remotion/install-whisper-cpp package, which bundles Whisper.cpp for local speech-to-text transcription without sending audio to external APIs.
The rule is part of the official Remotion Agent Skill collection. It targets developers building React-based videos with Claude Code, Cursor, or OpenAI Codex who need automatic subtitles.
How it saves time or tokens
Manual caption creation is tedious and error-prone. This rule teaches AI coding assistants the exact sequence: install whisper-cpp, download a model, transcribe audio, and convert the output to Remotion's caption format. Without the rule, an assistant might suggest external transcription APIs or generate incorrect package imports. The rule ensures the assistant produces working transcription code on the first attempt.
How to use
- Install the Remotion skills collection:
npx skills add remotion-dev/skills
- Add the whisper-cpp package to your Remotion project:
npx remotion add @remotion/install-whisper-cpp
- Create a Node.js script that downloads the model, transcribes audio, and outputs caption data.
Example
import path from 'path';
import {
downloadWhisperModel,
installWhisperCpp,
transcribe,
toCaptions,
} from '@remotion/install-whisper-cpp';
const whisperPath = path.join(process.cwd(), 'whisper.cpp');
await installWhisperCpp({ to: whisperPath });
await downloadWhisperModel({
model: 'medium.en',
folder: whisperPath,
});
const result = await transcribe({
inputPath: 'src/audio/narration.wav',
whisperPath,
model: 'medium.en',
tokenLevelTimestamps: true,
});
const captions = toCaptions({ transcription: result });
console.log(JSON.stringify(captions, null, 2));
Related on TokRepo
- Featured Workflows — Top-rated skills and workflows for AI-assisted development
- AI Tools for Video — Tools for video generation and editing with AI
This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.
For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.
Common pitfalls
- The
medium.enmodel is English-only but faster; usemediumorlargefor multilingual transcription at the cost of slower processing. - Whisper.cpp compiles native code on first install, so your system needs a C++ toolchain (Xcode CLI tools on macOS, build-essential on Linux).
- Token-level timestamps (
tokenLevelTimestamps: true) are required for word-by-word caption rendering; without them you only get segment-level timing.
Frequently Asked Questions
The transcribe function accepts WAV files. If your audio is in MP3 or AAC format, convert it to WAV first using ffmpeg. Remotion's own audio extraction tools can also produce WAV output from video files.
Whisper.cpp supports GPU acceleration via Metal on macOS and CUDA on Linux. The installWhisperCpp function compiles with available GPU support automatically. GPU transcription is significantly faster for longer audio files.
Accuracy depends on the model size. The medium.en model provides good accuracy for clear English speech. The large model improves accuracy for accented speech and noisy audio but takes longer to process.
No. Whisper.cpp runs entirely locally on your machine. There are no API calls, no rate limits, and no costs beyond compute time. Your audio never leaves your system.
Yes. The toCaptions function returns a JSON array of timed caption objects. You can filter, merge, or modify entries programmatically before passing them to your Remotion composition for rendering.
Citations (3)
- Remotion Whisper Docs— Remotion install-whisper-cpp provides local speech-to-text transcription
- Whisper.cpp GitHub— Whisper.cpp is a C++ port of OpenAI Whisper for local inference
- Remotion Skills GitHub— Remotion Agent Skill is the official skill set for AI-assisted video creation
Related on TokRepo
Source & Thanks
Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule:
transcribe-captions
Part of the Remotion AI Skill collection on TokRepo.
Discussion
Related Assets
/babysit — Auto-Respond to PR Review Comments
Open-source slash command that watches a PR for review comments and auto-pushes fixes. Inspired by Boris Cherny's /babysit pattern.
/loop — Local Recurring Task Scheduler (Boris-Style)
Open-source slash command for recurring local Claude Code tasks with a 3-day safety cap. Inspired by Boris Cherny's /loop scheduler.
/batch — Parallel Worktree Migration Slash Command
Open-source slash command that splits a migration across parallel git worktrees. Inspired by Boris Cherny's /batch worktree pattern.