SkillsMar 29, 2026·1 min read

Remotion Rule: Transcribe Captions

Remotion skill rule: Transcribing audio to generate captions in Remotion. Part of the official Remotion Agent Skill for programmatic video in React.

Skill Factory · Community

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Remotion Rule: Transcribe Captions

Direct install command

npx -y tokrepo@latest install 957e8918-a9ce-4b3e-97e1-6493d6ed75db --target codex

Run after dry-run confirms the install plan.

TL;DR

Use @remotion/install-whisper-cpp to transcribe audio and generate timed captions for Remotion videos.

§01

What it is

This is a Remotion skill rule for transcribing audio files to generate timed captions in programmatic video projects. It uses the @remotion/install-whisper-cpp package, which bundles Whisper.cpp for local speech-to-text transcription without sending audio to external APIs.

The rule is part of the official Remotion Agent Skill collection. It targets developers building React-based videos with Claude Code, Cursor, or OpenAI Codex who need automatic subtitles.

§02

How it saves time or tokens

Manual caption creation is tedious and error-prone. This rule teaches AI coding assistants the exact sequence: install whisper-cpp, download a model, transcribe audio, and convert the output to Remotion's caption format. Without the rule, an assistant might suggest external transcription APIs or generate incorrect package imports. The rule ensures the assistant produces working transcription code on the first attempt.

§03

How to use

Install the Remotion skills collection:

npx skills add remotion-dev/skills

Add the whisper-cpp package to your Remotion project:

npx remotion add @remotion/install-whisper-cpp

Create a Node.js script that downloads the model, transcribes audio, and outputs caption data.

§04

Example

import path from 'path';
import {
  downloadWhisperModel,
  installWhisperCpp,
  transcribe,
  toCaptions,
} from '@remotion/install-whisper-cpp';

const whisperPath = path.join(process.cwd(), 'whisper.cpp');
await installWhisperCpp({ to: whisperPath });
await downloadWhisperModel({
  model: 'medium.en',
  folder: whisperPath,
});

const result = await transcribe({
  inputPath: 'src/audio/narration.wav',
  whisperPath,
  model: 'medium.en',
  tokenLevelTimestamps: true,
});

const captions = toCaptions({ transcription: result });
console.log(JSON.stringify(captions, null, 2));

§05

Related on TokRepo

Featured Workflows — Top-rated skills and workflows for AI-assisted development
AI Tools for Video — Tools for video generation and editing with AI

This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.

For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.

§06

Common pitfalls

The medium.en model is English-only but faster; use medium or large for multilingual transcription at the cost of slower processing.
Whisper.cpp compiles native code on first install, so your system needs a C++ toolchain (Xcode CLI tools on macOS, build-essential on Linux).
Token-level timestamps (tokenLevelTimestamps: true) are required for word-by-word caption rendering; without them you only get segment-level timing.

Frequently Asked Questions

What audio formats does Remotion Whisper transcription support?+

The transcribe function accepts WAV files. If your audio is in MP3 or AAC format, convert it to WAV first using ffmpeg. Remotion's own audio extraction tools can also produce WAV output from video files.

Can I use a GPU to speed up transcription?+

Whisper.cpp supports GPU acceleration via Metal on macOS and CUDA on Linux. The installWhisperCpp function compiles with available GPU support automatically. GPU transcription is significantly faster for longer audio files.

How accurate are the generated captions?+

Accuracy depends on the model size. The medium.en model provides good accuracy for clear English speech. The large model improves accuracy for accented speech and noisy audio but takes longer to process.

Do I need an API key for transcription?+

No. Whisper.cpp runs entirely locally on your machine. There are no API calls, no rate limits, and no costs beyond compute time. Your audio never leaves your system.

Can I edit captions after transcription?+

Yes. The toCaptions function returns a JSON array of timed caption objects. You can filter, merge, or modify entries programmatically before passing them to your Remotion composition for rendering.

Citations (3)

Remotion Whisper Docs— Remotion install-whisper-cpp provides local speech-to-text transcription
Whisper.cpp GitHub— Whisper.cpp is a C++ port of OpenAI Whisper for local inference
Remotion Skills GitHub— Remotion Agent Skill is the official skill set for AI-assisted video creation

Related on TokRepo

Featured Workflows AI Video Tools AI Coding Tools

🙏

Source & Thanks

Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule: transcribe-captions

Part of the Remotion AI Skill collection on TokRepo.

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Remotion Rule: Import Srt Captions

Remotion skill rule: Importing .srt subtitle files into Remotion using @remotion/captions. Part of the official Remotion Agent Skill for programmatic video in React.

Skills

Skill Factory

Remotion Rule: Get Audio Duration

Remotion skill rule: Getting the duration of an audio file in seconds with Mediabunny. Part of the official Remotion Agent Skill for programmatic video in React.

Skills

Skill Factory

Remotion Rule: Text Animations

Remotion skill rule: Typography and text animation patterns for Remotion.. Part of the official Remotion Agent Skill for programmatic video in React.

Skills

Skill Factory

Remotion Rule: Videos

Remotion skill rule: Embedding videos in Remotion - trimming, volume, speed, looping, pitch. Part of the official Remotion Agent Skill for programmatic video in React.

Skills

Skill Factory