Scripts2026年4月2日·1 分钟阅读

Coqui TTS — Deep Learning Text-to-Speech Engine

Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars.

TokRepo精选 · Community

Agent 就绪

这个资产会安全暂存

这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件，并在激活脚本、MCP 配置或全局配置前先确认。

Stage only · 17/100策略：需暂存

Agent 入口

任意 MCP/CLI Agent

类型

Script

安装

Stage only

信任

信任等级：Established

入口

coqui-tts.md

安全暂存命令

npx -y tokrepo@latest install a059dce2-6275-4ea0-a57b-e885248d8e95 --target codex

先暂存文件；激活前需要读取暂存 README 和安装计划。

TL;DR

Coqui TTS generates speech in 1100+ languages with real-time voice cloning and sub-200ms streaming latency.

§01

What it is

Coqui TTS is an open-source deep learning text-to-speech engine that supports over 1100 languages. Its XTTS v2 model enables voice cloning from short audio samples with streaming output under 200ms latency. You can generate speech from text, clone voices, and fine-tune models on custom datasets.

Coqui TTS targets developers building voice interfaces, accessibility tools, content creation pipelines, and any application that needs high-quality synthesized speech without proprietary API costs.

§02

How it saves time or tokens

Coqui TTS runs locally, eliminating per-request API costs from cloud TTS services. The pre-trained models cover most languages out of the box. Voice cloning requires only a few seconds of reference audio, avoiding expensive studio recording sessions. The streaming API enables real-time voice output for interactive applications.

§03

How to use

Install Coqui TTS:

pip install TTS

Generate speech from the command line:

tts --text 'Hello, this is a test.' \
    --model_name tts_models/en/ljspeech/tacotron2-DDC \
    --out_path output.wav

Clone a voice with XTTS:

from TTS.api import TTS

tts = TTS('tts_models/multilingual/multi-dataset/xtts_v2')
tts.tts_to_file(
    text='Hello, this is my cloned voice.',
    speaker_wav='reference_audio.wav',
    language='en',
    file_path='cloned_output.wav'
)

§04

Example

Streaming TTS for real-time applications:

from TTS.api import TTS
import sounddevice as sd
import numpy as np

tts = TTS('tts_models/multilingual/multi-dataset/xtts_v2')

wav = tts.tts(
    text='Streaming text to speech in real time.',
    speaker_wav='reference.wav',
    language='en'
)

sd.play(np.array(wav), samplerate=24000)
sd.wait()

§05

Related on TokRepo

Voice tools — text-to-speech and voice AI resources
AI coding tools — developer tools and libraries

§06

Common pitfalls

XTTS v2 requires a GPU for reasonable inference speed. CPU inference works but is too slow for real-time applications.
Voice cloning quality depends on reference audio quality. Use clean, noise-free recordings of at least 6 seconds for best results.
Model downloads are large (several GB). Plan for storage and bandwidth when deploying to new environments.

常见问题

Does Coqui TTS work offline?+

Yes. All models run locally after download. No internet connection or API key is needed for inference. This makes it suitable for on-premises and privacy-sensitive deployments.

How many languages does it support?+

Coqui TTS supports over 1100 languages through its multilingual models. XTTS v2 specifically handles 17 languages with high quality. Other models cover additional languages.

Can I fine-tune models on custom data?+

Yes. Coqui TTS provides training scripts for fine-tuning on custom datasets. You need transcribed audio data in the expected format. Fine-tuning XTTS requires a GPU with at least 16GB VRAM.

What is the license?+

Coqui TTS code is released under the Mozilla Public License 2.0. Individual model weights may have their own licenses. Check each model's license before commercial use.

How does voice cloning work?+

XTTS v2 takes a short reference audio clip (3-10 seconds) and extracts speaker characteristics. It then generates new speech in that voice from any text input. No training or fine-tuning is needed for zero-shot cloning.

引用来源 (3)

Coqui TTS GitHub— Coqui TTS supports 1100+ languages with XTTS v2 voice cloning
Coqui TTS README— XTTS v2 streams with under 200ms latency
Coqui TTS License— Mozilla Public License 2.0

🙏

来源与感谢

Created by Coqui AI. Licensed under MPL-2.0.

TTS — ⭐ 44,900+

Thanks to the Coqui AI team and community for building the most comprehensive open-source TTS toolkit.

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Coqui TTS — Deep Learning Text-to-Speech Engine

这个资产会安全暂存

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Parler-TTS — High-Quality Text-to-Speech Training and Inference Library

Zonos — Multilingual TTS with Voice Cloning

Tortoise TTS — Multi-Voice Text-to-Speech Focused on Quality

Deepgram Aura TTS — Text-to-Speech for Voice Agents