Scripts2026年5月30日·1 分钟阅读

OmniVoice Studio — Open-Source Voice Cloning and TTS Desktop App

OmniVoice Studio is a self-hosted desktop application for voice cloning, text-to-speech, dubbing, and dictation. It runs entirely on your local machine, providing a privacy-first alternative to cloud-based voice synthesis services.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
OmniVoice Studio
直接安装命令
npx -y tokrepo@latest install ad28d8d0-5c21-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

OmniVoice Studio provides local voice cloning, text-to-speech synthesis, dubbing, and dictation capabilities without sending audio data to third-party servers. It targets developers and content creators who need high-quality voice generation while retaining full control over their data.

What OmniVoice Studio Does

  • Clones voices from short audio samples for personalized speech synthesis
  • Generates speech in multiple languages with natural intonation
  • Provides video dubbing with automatic lip-sync alignment
  • Offers real-time dictation and transcription via local speech recognition
  • Runs entirely on-device using local GPU acceleration

Architecture Overview

OmniVoice Studio is built as a Python desktop application with a web-based UI. It integrates multiple open-source TTS and ASR models, routing audio through a local inference pipeline. Voice cloning uses speaker embedding extraction paired with a multi-speaker synthesis model, while dubbing leverages forced alignment to match translated speech to video timing.

Self-Hosting & Configuration

  • Requires Python 3.10+ and a CUDA-capable GPU for optimal performance
  • Install dependencies via pip from the provided requirements file
  • Configure model paths and output directories in the settings panel
  • Supports Docker deployment for isolated environments
  • GPU memory requirements vary by model; 8 GB VRAM is recommended

Key Features

  • Privacy-first design with zero cloud dependency
  • Multi-language TTS supporting dozens of languages
  • Voice cloning from as little as 10 seconds of reference audio
  • Built-in audio editor for post-processing generated speech
  • Extensible architecture supporting custom model backends

Comparison with Similar Tools

  • ElevenLabs — cloud-based with usage limits and subscription costs; OmniVoice runs locally for free
  • Coqui TTS — library-focused without a desktop UI; OmniVoice provides an integrated application
  • Bark — generates audio with music and effects but lacks voice cloning; OmniVoice specializes in cloning
  • Fish Speech — strong multilingual TTS but no dubbing workflow; OmniVoice includes video dubbing
  • Kokoro — lightweight 82M model with limited customization; OmniVoice supports multiple model backends

FAQ

Q: Does OmniVoice Studio require an internet connection? A: No. All processing happens locally on your machine once models are downloaded.

Q: What GPU is needed to run OmniVoice Studio? A: An NVIDIA GPU with at least 8 GB VRAM is recommended. CPU-only mode works but is significantly slower.

Q: Can I use cloned voices commercially? A: The software is open source, but you are responsible for complying with applicable laws regarding voice cloning and consent.

Q: Which audio formats are supported? A: WAV, MP3, FLAC, and OGG are supported for both input and output.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产