SkillsMar 29, 2026·3 min read

Video AI Toolkit — Complete Collection

Curated video AI tools: Remotion (programmatic video), Manim (math animation), MoviePy (editing), Whisper (speech-to-text), ElevenLabs (voiceover). Build automated video pipelines.

Skill Factory · Community

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Video AI Toolkit — Complete Collection

Direct install command

npx -y tokrepo@latest install c13f8a88-c947-47b6-9364-5f1a43e5b3ea --target codex

Run after dry-run confirms the install plan.

TL;DR

A curated collection of video AI tools covering programmatic video, animation, editing, transcription, and voiceover.

§01

What it is

This collection brings together five video AI tools that form a complete automated video production pipeline: Remotion for programmatic video generation, Manim for math animations, MoviePy for video editing, Whisper for speech-to-text transcription, and ElevenLabs for AI voiceover.

The toolkit targets content creators, educators, and developers building automated video pipelines. Each tool handles a specific stage of video production, and they compose into end-to-end workflows.

The project is actively maintained and suitable for both individual developers and teams looking to integrate it into their existing toolchain. Documentation and community support are available for onboarding.

§02

How it saves time or tokens

Manual video editing is time-intensive and non-reproducible. This toolkit automates each stage: generate visuals with code (Remotion/Manim), add voiceover (ElevenLabs), transcribe for captions (Whisper), and edit together (MoviePy). Once your pipeline is defined, producing variations or updates takes minutes instead of hours.

For teams evaluating multiple tools in the same category, the clear documentation and active community reduce the time spent on research and troubleshooting. Getting started takes minutes rather than hours of configuration.

§03

How to use

Choose your visual engine: Remotion for React-based video, Manim for mathematical animations.
Generate a voiceover script and synthesize audio with ElevenLabs TTS.
Combine visuals and audio using MoviePy for final composition, transitions, and export.
Generate captions by running Whisper on the final audio track.

§04

Example

# MoviePy: Combine Remotion video with ElevenLabs voiceover
from moviepy.editor import VideoFileClip, AudioFileClip, CompositeVideoClip

video = VideoFileClip('remotion_output.mp4')
audio = AudioFileClip('elevenlabs_voiceover.mp3')

# Set voiceover as the audio track
final = video.set_audio(audio)

# Add a fade-in and fade-out
final = final.fadein(1).fadeout(1)

# Export
final.write_videofile('final_video.mp4', fps=30, codec='libx264')

§05

Related on TokRepo

AI Tools for Video — Browse more AI-powered video tools and editors.
AI Tools for Voice — Text-to-speech and voice synthesis tools.

§06

Common pitfalls

Generating high-resolution video without checking render time first. Start with 720p for drafts and only render 1080p/4K for final output.
Using Whisper's largest model for simple transcription tasks. The medium model provides good accuracy for most content at a fraction of the compute cost.
Not syncing audio and video timelines before final export. Misaligned audio is immediately noticeable. Use MoviePy's set_start and set_duration for precise alignment.
Applying the skill without reading the documentation first. Each skill has specific prerequisites and configuration requirements that affect the quality of results.

Frequently Asked Questions

Can I use this toolkit without coding experience?+

Some coding is required. Remotion uses React, Manim uses Python, and MoviePy uses Python. The learning curve is moderate. Pre-built templates can reduce the amount of custom code needed.

Is this toolkit free?+

Remotion, Manim, MoviePy, and Whisper are free and open-source. ElevenLabs has a free tier with limited characters per month. Production usage of ElevenLabs requires a paid plan.

Can I run the entire pipeline locally?+

Yes. All tools except ElevenLabs run locally. For fully local voice synthesis, you can substitute ElevenLabs with open-source TTS models like Coqui or Bark, though quality may differ.

What video formats does MoviePy support?+

MoviePy supports MP4, AVI, MOV, WebM, GIF, and any format supported by FFmpeg. The default codec for MP4 output is libx264 with AAC audio.

How long does it take to render a one-minute video?+

Render time depends on resolution and complexity. A simple 1080p video with Remotion takes 30-60 seconds on a modern machine. Manim animations may take longer depending on scene complexity.

Citations (3)

Remotion Official Site— Remotion: programmatic video creation with React
Whisper GitHub— Whisper: speech-to-text by OpenAI
ElevenLabs— ElevenLabs text-to-speech API

Related on TokRepo

Video tools Voice tools Featured workflows

🙏

Source & Thanks

This collection curates the best open-source video AI tools. All assets link to their original repositories with full attribution.

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

openFrameworks — Open-Source C++ Toolkit for Creative Coding

openFrameworks (oF) is a C++ toolkit for creative coding: interactive installations, generative art, computer vision, real-time video. Used by artists, researchers, and production studios worldwide for the past two decades.

Skills

AI Open Source

FunASR — End-to-End Speech Recognition Toolkit

FunASR is an open-source speech recognition toolkit by Alibaba DAMO Academy supporting ASR, voice activity detection, punctuation restoration, and text normalization. It ships pretrained models for 50+ languages and provides production-ready server deployment with streaming support.

Skills

AI Open Source

Remotion AI Video Production Skill — Cinema-Grade Short Videos

Pushed via CLI: video-production-skill-push.md

Skills

henuwangkai· ⭐ 1

CogVideo — Text and Image to Video Generation

An open-source video generation framework from Zhipu AI supporting text-to-video and image-to-video with CogVideoX models. Generates high-quality clips up to 6 seconds.

Skills

Script Depot