How It Works
- Define your script — Text for each scene in a config file
- Generate audio — Script calls ElevenLabs API, writes MP3s to
public/ - Dynamic duration —
calculateMetadatareads audio duration, sizes composition accordingly - Render — Remotion renders video with synced voiceover
Generating Audio with ElevenLabs
// generate-voiceover.ts
const response = await fetch("https://api.elevenlabs.io/v1/text-to-speech/{voice_id}", {
method: "POST",
headers: {
"xi-api-key": process.env.ELEVENLABS_API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({
text: sceneText,
model_id: "eleven_multilingual_v2",
}),
});
// Write audio to public/voiceover-scene-1.mp3Run: node --strip-types generate-voiceover.ts
Dynamic Composition Duration
export const calculateMetadata = async () => {
const duration = await getAudioDurationInSeconds(staticFile("voiceover.mp3"));
return { durationInFrames: Math.ceil(duration * 30) };
};FAQ
Q: What TTS service does the Remotion voiceover skill use? A: ElevenLabs by default, but any TTS service that produces audio files can be substituted.
Q: Does the video duration auto-adjust to the voiceover?
A: Yes. The skill uses Remotion's calculateMetadata to dynamically set composition duration based on the generated audio length.