SkillsMar 29, 2026·2 min read

Remotion Rule: Voiceover

Remotion skill rule: Adding AI-generated voiceover to Remotion compositions using TTS. Part of the official Remotion Agent Skill for programmatic video in React.

TL;DR
Add AI voiceover to Remotion compositions using ElevenLabs TTS with automatic duration matching.
§01

What it is

The Remotion Voiceover Rule is a skill rule from the official Remotion Agent Skill set. It provides a structured pattern for adding AI-generated speech audio to Remotion video compositions using text-to-speech (TTS) services, with ElevenLabs as the default provider. The rule activates automatically when working with voiceover in a Remotion project.

This rule is intended for developers building programmatic video with React via Remotion who want to add narration or voiceover without manual audio recording.

§02

How it saves time or tokens

Without this rule, developers must figure out the TTS integration, audio file management, and composition duration matching from scratch. The rule provides a ready-made pattern: generate MP3 files per scene, use calculateMetadata to dynamically size the composition to match audio length, and wire everything together. This eliminates trial-and-error and keeps the AI coding agent on the correct path, saving both developer time and token usage during AI-assisted development.

§03

How to use

  1. Install the Remotion skills package:
npx skills add remotion-dev/skills
  1. Set your ElevenLabs API key as an environment variable:
export ELEVENLABS_API_KEY=your_key_here
  1. Create a voiceover generation script that reads your scene config and calls the ElevenLabs API for each scene, writing MP3 files to the public/ directory. Then run it:
node --strip-types generate-voiceover.ts
  1. Use calculateMetadata in your Remotion composition to read the audio duration and set the composition length dynamically.
§04

Example

A minimal voiceover generation script structure:

import { ElevenLabsClient } from 'elevenlabs';
import fs from 'fs';

const client = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});

async function generateVoiceover(text: string, outputPath: string) {
  const audio = await client.generate({
    text,
    voice: 'Rachel',
    model_id: 'eleven_multilingual_v2',
  });
  const buffer = Buffer.from(await audio.arrayBuffer());
  fs.writeFileSync(outputPath, buffer);
}

// Generate per scene
const scenes = [
  { text: 'Welcome to our product demo.', file: 'public/voice-1.mp3' },
  { text: 'Here is the key feature.', file: 'public/voice-2.mp3' },
];

for (const scene of scenes) {
  await generateVoiceover(scene.text, scene.file);
}

Then in your Remotion composition, use calculateMetadata to read audio duration and set durationInFrames accordingly.

§05

Related on TokRepo

§06

Common pitfalls

  • Forgetting to set the ELEVENLABS_API_KEY environment variable before running the generation script. The script will fail silently or throw an auth error.
  • Not using calculateMetadata to dynamically size the composition. Hardcoding durationInFrames leads to audio being cut off or having long silence at the end.
  • Generating all voiceover audio on every render. Cache the MP3 files and only regenerate when the script text changes.

Frequently Asked Questions

Which TTS providers does this rule support?+

The rule defaults to ElevenLabs but explicitly states that any TTS service producing an audio file can be substituted. You need to swap out the API call in the generation script while keeping the same file output pattern.

How does the composition know the audio duration?+

The rule uses Remotion's calculateMetadata function to read the generated audio file, extract its duration, and set the composition's durationInFrames dynamically. This ensures video length matches the voiceover exactly.

Can I use multiple voices for different scenes?+

Yes. The generation script processes scenes independently, so you can specify a different voice ID or provider per scene. Just update the voice parameter in each generation call.

Does this work with Remotion Lambda for cloud rendering?+

Yes, but you must generate the audio files before the render starts and include them in the bundle. Remotion Lambda bundles the public/ directory, so pre-generated MP3 files are available during cloud rendering.

How do I install this rule in my project?+

Run npx skills add remotion-dev/skills in your project directory. The rule activates automatically when the AI agent detects voiceover-related work in a Remotion project. No manual configuration is needed beyond the TTS API key.

Citations (3)
🙏

Source & Thanks

Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule: voiceover

Part of the Remotion AI Skill collection on TokRepo.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.