Remotion Captions & Subtitles — AI-Powered Video Subtitles
AI skill for generating and rendering captions in Remotion videos. Supports transcription, word-level timing, and styled subtitle export.
What it is
Remotion Captions is an AI skill for generating and rendering subtitles in Remotion videos. It handles transcription with word-level timing, styled subtitle rendering, and export within the Remotion React-based video framework. You feed it audio, it produces timed captions that render as animated text overlays in your video.
This skill targets developers building programmatic videos with Remotion who need automated captioning. It is part of the official Remotion Agent Skill set, designed to work within AI-assisted video production workflows.
Why it saves time or tokens
Manually timing subtitles frame-by-frame is one of the most tedious tasks in video production. This skill automates the entire pipeline: transcribe audio, align words to timestamps, and render styled captions. For AI agents generating videos programmatically, this reduces a multi-hour manual process to a single function call.
How to use
- Set up a Remotion project with the
@remotion/captionspackage installed - Provide the audio source for transcription (supports Whisper and other transcription backends)
- Use the caption components in your Remotion composition to render timed subtitles
Example
import { Subtitle } from '@remotion/captions';
import { useVideoConfig } from 'remotion';
const MyVideo = () => {
const { fps } = useVideoConfig();
const captions = [
{ text: 'Hello world', startMs: 0, endMs: 1500 },
{ text: 'This is automated', startMs: 1600, endMs: 3000 },
];
return (
<div style={{ position: 'relative' }}>
<Subtitle
captions={captions}
fps={fps}
style={{
fontSize: 48,
color: 'white',
textShadow: '2px 2px 4px black'
}}
/>
</div>
);
};
| Feature | Description |
|---|---|
| Word-level timing | Each word has start/end timestamps |
| Custom styling | CSS-based font, color, shadow |
| Animation | Fade, pop, highlight effects |
| Multi-language | Works with any transcription language |
Related on TokRepo
- AI tools for video — video production and editing tools curated on TokRepo
- AI tools for content — content creation tools for AI-assisted workflows
Common pitfalls
- Transcription accuracy depends on audio quality; noisy backgrounds or multiple speakers reduce word-level timing precision
- Custom fonts must be loaded before rendering; use the Remotion font loading system or captions will render in the default system font
- Long videos with many caption segments can slow down the Remotion preview; use composition splitting for videos over 5 minutes
Frequently Asked Questions
Remotion Captions works with transcription output from Whisper, Deepgram, and other services that produce word-level timestamps. The caption system consumes a JSON array of timed words, so any transcription backend that outputs start and end times per word is compatible.
Yes. Captions are rendered as React components with full CSS styling support. You control font size, color, background, shadow, position, and animation. Since Remotion uses React, you can build custom caption components with any visual effect that CSS and React support.
Yes. This skill is designed for the Remotion Agent Skill ecosystem. An AI agent can generate a video script, synthesize speech, transcribe it for timing, and render captions automatically. The entire pipeline can run without human intervention.
Accuracy depends on the transcription backend. Whisper typically achieves within 50-100ms accuracy for word boundaries on clean audio. Background noise, music, or overlapping speech reduces accuracy. Post-processing alignment tools can improve timing for difficult audio.
Yes. You can render multiple caption tracks by providing different transcription outputs for each language. Remotion supports rendering multiple subtitle components simultaneously, so you can show bilingual captions or let users switch between languages in the final video.
Citations (3)
- Remotion GitHub— Remotion is a framework for making videos programmatically in React
- Remotion Docs— @remotion/captions package for subtitle rendering
- OpenAI Whisper— Whisper provides word-level transcription timestamps
Related on TokRepo
Source & Thanks
Created by Remotion. Licensed under MIT. remotion-dev/skills — Subtitles rule