Skills2026年3月29日·1 分钟阅读

Remotion Rule: Transcribe Captions

Remotion skill rule: Transcribing audio to generate captions in Remotion. Part of the official Remotion Agent Skill for programmatic video in React.

Skill Factory · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

Remotion Rule: Transcribe Captions

直接安装命令

npx -y tokrepo@latest install 957e8918-a9ce-4b3e-97e1-6493d6ed75db --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Use @remotion/install-whisper-cpp to transcribe audio and generate timed captions for Remotion videos.

§01

What it is

This is a Remotion skill rule for transcribing audio files to generate timed captions in programmatic video projects. It uses the @remotion/install-whisper-cpp package, which bundles Whisper.cpp for local speech-to-text transcription without sending audio to external APIs.

The rule is part of the official Remotion Agent Skill collection. It targets developers building React-based videos with Claude Code, Cursor, or OpenAI Codex who need automatic subtitles.

§02

How it saves time or tokens

Manual caption creation is tedious and error-prone. This rule teaches AI coding assistants the exact sequence: install whisper-cpp, download a model, transcribe audio, and convert the output to Remotion's caption format. Without the rule, an assistant might suggest external transcription APIs or generate incorrect package imports. The rule ensures the assistant produces working transcription code on the first attempt.

§03

How to use

Install the Remotion skills collection:

npx skills add remotion-dev/skills

Add the whisper-cpp package to your Remotion project:

npx remotion add @remotion/install-whisper-cpp

Create a Node.js script that downloads the model, transcribes audio, and outputs caption data.

§04

Example

import path from 'path';
import {
  downloadWhisperModel,
  installWhisperCpp,
  transcribe,
  toCaptions,
} from '@remotion/install-whisper-cpp';

const whisperPath = path.join(process.cwd(), 'whisper.cpp');
await installWhisperCpp({ to: whisperPath });
await downloadWhisperModel({
  model: 'medium.en',
  folder: whisperPath,
});

const result = await transcribe({
  inputPath: 'src/audio/narration.wav',
  whisperPath,
  model: 'medium.en',
  tokenLevelTimestamps: true,
});

const captions = toCaptions({ transcription: result });
console.log(JSON.stringify(captions, null, 2));

§05

Related on TokRepo

Featured Workflows — Top-rated skills and workflows for AI-assisted development
AI Tools for Video — Tools for video generation and editing with AI

This tool integrates with standard development workflows and requires minimal configuration to get started. It is available as open-source software with documentation and community support through the official repository. The project follows semantic versioning for stable releases.

For teams evaluating this tool, the key advantage is reducing manual work in repetitive tasks. The automation provided by the built-in features means less custom code to maintain and fewer integration points to manage. This translates directly to lower maintenance costs and faster iteration cycles.

§06

Common pitfalls

The medium.en model is English-only but faster; use medium or large for multilingual transcription at the cost of slower processing.
Whisper.cpp compiles native code on first install, so your system needs a C++ toolchain (Xcode CLI tools on macOS, build-essential on Linux).
Token-level timestamps (tokenLevelTimestamps: true) are required for word-by-word caption rendering; without them you only get segment-level timing.

常见问题

What audio formats does Remotion Whisper transcription support?+

The transcribe function accepts WAV files. If your audio is in MP3 or AAC format, convert it to WAV first using ffmpeg. Remotion's own audio extraction tools can also produce WAV output from video files.

Can I use a GPU to speed up transcription?+

Whisper.cpp supports GPU acceleration via Metal on macOS and CUDA on Linux. The installWhisperCpp function compiles with available GPU support automatically. GPU transcription is significantly faster for longer audio files.

How accurate are the generated captions?+

Accuracy depends on the model size. The medium.en model provides good accuracy for clear English speech. The large model improves accuracy for accented speech and noisy audio but takes longer to process.

Do I need an API key for transcription?+

No. Whisper.cpp runs entirely locally on your machine. There are no API calls, no rate limits, and no costs beyond compute time. Your audio never leaves your system.

Can I edit captions after transcription?+

Yes. The toCaptions function returns a JSON array of timed caption objects. You can filter, merge, or modify entries programmatically before passing them to your Remotion composition for rendering.

引用来源 (3)

Remotion Whisper Docs— Remotion install-whisper-cpp provides local speech-to-text transcription
Whisper.cpp GitHub— Whisper.cpp is a C++ port of OpenAI Whisper for local inference
Remotion Skills GitHub— Remotion Agent Skill is the official skill set for AI-assisted video creation

🙏

来源与感谢

Created by Remotion. Licensed under MIT. remotion-dev/skills — Rule: transcribe-captions

Part of the Remotion AI Skill collection on TokRepo.

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Remotion Rule: Transcribe Captions

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Remotion Rule: Import Srt Captions

Remotion Rule: Get Audio Duration

Remotion Rule: Text Animations

Remotion Rule: Videos