Skills2026年3月29日·1 分钟阅读

Remotion Captions & Subtitles — AI-Powered Video Subtitles

AI skill for generating and rendering captions in Remotion videos. Supports transcription, word-level timing, and styled subtitle export.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Remotion Captions & Subtitles Skill
直接安装命令
npx -y tokrepo@latest install 7775f06a-8adf-477a-91e9-85f51682cd10 --target codex

先 dry-run 确认安装计划,再运行此命令。

TL;DR
Generate and render word-level captions in Remotion videos with AI transcription.
§01

What it is

Remotion Captions is an AI skill for generating and rendering subtitles in Remotion videos. It handles transcription with word-level timing, styled subtitle rendering, and export within the Remotion React-based video framework. You feed it audio, it produces timed captions that render as animated text overlays in your video.

This skill targets developers building programmatic videos with Remotion who need automated captioning. It is part of the official Remotion Agent Skill set, designed to work within AI-assisted video production workflows.

§02

Why it saves time or tokens

Manually timing subtitles frame-by-frame is one of the most tedious tasks in video production. This skill automates the entire pipeline: transcribe audio, align words to timestamps, and render styled captions. For AI agents generating videos programmatically, this reduces a multi-hour manual process to a single function call.

§03

How to use

  1. Set up a Remotion project with the @remotion/captions package installed
  2. Provide the audio source for transcription (supports Whisper and other transcription backends)
  3. Use the caption components in your Remotion composition to render timed subtitles
§04

Example

import { Subtitle } from '@remotion/captions';
import { useVideoConfig } from 'remotion';

const MyVideo = () => {
  const { fps } = useVideoConfig();
  const captions = [
    { text: 'Hello world', startMs: 0, endMs: 1500 },
    { text: 'This is automated', startMs: 1600, endMs: 3000 },
  ];

  return (
    <div style={{ position: 'relative' }}>
      <Subtitle
        captions={captions}
        fps={fps}
        style={{
          fontSize: 48,
          color: 'white',
          textShadow: '2px 2px 4px black'
        }}
      />
    </div>
  );
};
FeatureDescription
Word-level timingEach word has start/end timestamps
Custom stylingCSS-based font, color, shadow
AnimationFade, pop, highlight effects
Multi-languageWorks with any transcription language
§05

Related on TokRepo

§06

Common pitfalls

  • Transcription accuracy depends on audio quality; noisy backgrounds or multiple speakers reduce word-level timing precision
  • Custom fonts must be loaded before rendering; use the Remotion font loading system or captions will render in the default system font
  • Long videos with many caption segments can slow down the Remotion preview; use composition splitting for videos over 5 minutes

常见问题

What transcription services does Remotion Captions support?+

Remotion Captions works with transcription output from Whisper, Deepgram, and other services that produce word-level timestamps. The caption system consumes a JSON array of timed words, so any transcription backend that outputs start and end times per word is compatible.

Can I customize the subtitle appearance?+

Yes. Captions are rendered as React components with full CSS styling support. You control font size, color, background, shadow, position, and animation. Since Remotion uses React, you can build custom caption components with any visual effect that CSS and React support.

Does Remotion Captions work with AI video generation agents?+

Yes. This skill is designed for the Remotion Agent Skill ecosystem. An AI agent can generate a video script, synthesize speech, transcribe it for timing, and render captions automatically. The entire pipeline can run without human intervention.

How accurate is the word-level timing?+

Accuracy depends on the transcription backend. Whisper typically achieves within 50-100ms accuracy for word boundaries on clean audio. Background noise, music, or overlapping speech reduces accuracy. Post-processing alignment tools can improve timing for difficult audio.

Can I add captions in multiple languages?+

Yes. You can render multiple caption tracks by providing different transcription outputs for each language. Remotion supports rendering multiple subtitle components simultaneously, so you can show bilingual captions or let users switch between languages in the final video.

引用来源 (3)
  • Remotion GitHub— Remotion is a framework for making videos programmatically in React
  • Remotion Docs— @remotion/captions package for subtitle rendering
  • OpenAI Whisper— Whisper provides word-level transcription timestamps
🙏

来源与感谢

Created by Remotion. Licensed under MIT. remotion-dev/skills — Subtitles rule

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产