Skills2026年3月29日·1 分钟阅读

Remotion Rule: Voiceover

Remotion skill rule: Adding AI-generated voiceover to Remotion compositions using TTS. Part of the official Remotion Agent Skill for programmatic video in React.

Agent 就绪

这个资产会安全暂存

这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。

Stage only · 29/100策略:需暂存
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Stage only
信任
信任等级:Established
入口
Remotion Rule: Voiceover
安全暂存命令
npx -y tokrepo@latest install 47ec82a6-f5e7-4530-94cd-3de1508ccb43 --target codex

先暂存文件;激活前需要读取暂存 README 和安装计划。

TL;DR
Add AI voiceover to Remotion compositions using ElevenLabs TTS with automatic duration matching.
§01

What it is

The Remotion Voiceover Rule is a skill rule from the official Remotion Agent Skill set. It provides a structured pattern for adding AI-generated speech audio to Remotion video compositions using text-to-speech (TTS) services, with ElevenLabs as the default provider. The rule activates automatically when working with voiceover in a Remotion project.

This rule is intended for developers building programmatic video with React via Remotion who want to add narration or voiceover without manual audio recording.

§02

How it saves time or tokens

Without this rule, developers must figure out the TTS integration, audio file management, and composition duration matching from scratch. The rule provides a ready-made pattern: generate MP3 files per scene, use calculateMetadata to dynamically size the composition to match audio length, and wire everything together. This eliminates trial-and-error and keeps the AI coding agent on the correct path, saving both developer time and token usage during AI-assisted development.

§03

How to use

  1. Install the Remotion skills package:
npx skills add remotion-dev/skills
  1. Set your ElevenLabs API key as an environment variable:
export ELEVENLABS_API_KEY=your_key_here
  1. Create a voiceover generation script that reads your scene config and calls the ElevenLabs API for each scene, writing MP3 files to the public/ directory. Then run it:
node --strip-types generate-voiceover.ts
  1. Use calculateMetadata in your Remotion composition to read the audio duration and set the composition length dynamically.
§04

Example

A minimal voiceover generation script structure:

import { ElevenLabsClient } from 'elevenlabs';
import fs from 'fs';

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});

async function generateVoiceover(text: string, outputPath: string) {
  const audio = await client.generate({
    text,
    voice: 'Rachel',
    model_id: 'eleven_multilingual_v2',
  });
  const buffer = Buffer.from(await audio.arrayBuffer());
  fs.writeFileSync(outputPath, buffer);
}

// Generate per scene
const scenes = [
  { text: 'Welcome to our product demo.', file: 'public/voice-1.mp3' },
  { text: 'Here is the key feature.', file: 'public/voice-2.mp3' },
];

for (const scene of scenes) {
  await generateVoiceover(scene.text, scene.file);
}

Then in your Remotion composition, use calculateMetadata to read audio duration and set durationInFrames accordingly.

§05

Related on TokRepo

§06

Common pitfalls

  • Forgetting to set the ELEVENLABS_API_KEY environment variable before running the generation script. The script will fail silently or throw an auth error.
  • Not using calculateMetadata to dynamically size the composition. Hardcoding durationInFrames leads to audio being cut off or having long silence at the end.
  • Generating all voiceover audio on every render. Cache the MP3 files and only regenerate when the script text changes.

常见问题

Which TTS providers does this rule support?+

The rule defaults to ElevenLabs but explicitly states that any TTS service producing an audio file can be substituted. You need to swap out the API call in the generation script while keeping the same file output pattern.

How does the composition know the audio duration?+

The rule uses Remotion's calculateMetadata function to read the generated audio file, extract its duration, and set the composition's durationInFrames dynamically. This ensures video length matches the voiceover exactly.

Can I use multiple voices for different scenes?+

Yes. The generation script processes scenes independently, so you can specify a different voice ID or provider per scene. Just update the voice parameter in each generation call.

Does this work with Remotion Lambda for cloud rendering?+

Yes, but you must generate the audio files before the render starts and include them in the bundle. Remotion Lambda bundles the public/ directory, so pre-generated MP3 files are available during cloud rendering.

How do I install this rule in my project?+

Run npx skills add remotion-dev/skills in your project directory. The rule activates automatically when the AI agent detects voiceover-related work in a Remotion project. No manual configuration is needed beyond the TTS API key.

引用来源 (3)
🙏

来源与感谢

Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule: voiceover

Part of the Remotion AI Skill collection on TokRepo.

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产