# Coqui TTS — Deep Learning Text-to-Speech Engine

> Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars.

## Install

Save as a script file and run:

# Coqui TTS — Deep Learning Text-to-Speech Engine

## Quick Use

```bash
pip install TTS
```

```bash
# List available models
tts --list_models

# Generate speech from text (English)
tts --text "Hello, welcome to TokRepo." --out_path output.wav

# Use XTTS v2 for multilingual + voice cloning
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "你好，欢迎来到TokRepo。" \
    --speaker_wav reference_voice.wav \
    --language_idx zh-cn \
    --out_path output_zh.wav
```

```python
from TTS.api import TTS

# Initialize XTTS v2
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# Generate speech with voice cloning
tts.tts_to_file(
    text="Welcome to the future of AI voice.",
    speaker_wav="my_voice.wav",
    language="en",
    file_path="output.wav"
)
```

---

## Intro

Coqui TTS is the most comprehensive open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages via pretrained models. Its flagship XTTS v2 model delivers production-quality multilingual speech with voice cloning in just 6 seconds of reference audio and under 200ms streaming latency. The library implements every major TTS architecture — VITS, Tacotron 2, Glow-TTS, Bark, Tortoise — with a unified Python API and CLI. While Coqui the company closed in 2023, the open-source project remains the go-to TTS toolkit for developers worldwide.

Works with: Python, CUDA GPUs, CPU (slower), any application via CLI or Python API. Best for developers adding voice to AI agents, chatbots, accessibility tools, or content creation pipelines. Setup time: under 3 minutes.

---

## Coqui TTS Model Zoo & Features

### Model Architectures

| Model | Type | Quality | Speed | Voice Clone |
|-------|------|---------|-------|-------------|
| **XTTS v2** | End-to-end | ★★★★★ | Fast (GPU) | ✅ 6s reference |
| **VITS** | End-to-end | ★★★★ | Very fast | ❌ |
| **YourTTS** | Multi-speaker | ★★★★ | Fast | ✅ |
| **Bark** | Generative | ★★★★ | Slow | ❌ (but expressive) |
| **Tortoise** | Diffusion | ★★★★★ | Very slow | ✅ |
| **Tacotron 2** | Spectrogram | ★★★ | Medium | ❌ |
| **Glow-TTS** | Flow-based | ★★★ | Fast | ❌ |

### XTTS v2 — Flagship Model

The recommended model for most use cases:

```python
from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# 16 supported languages
languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr",
             "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko"]

# Voice cloning from 6-second reference
tts.tts_to_file(
    text="This is my cloned voice speaking.",
    speaker_wav="reference.wav",  # Just 6 seconds needed
    language="en",
    file_path="cloned_output.wav"
)
```

Features:
- **16 languages** with natural prosody
- **Voice cloning** from just 6 seconds of reference audio
- **Streaming** with under 200ms latency
- **Emotion preservation** from reference audio

### Streaming TTS

```python
from TTS.api import TTS
import sounddevice as sd
import numpy as np

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# Stream audio chunks in real-time
chunks = tts.tts_stream(
    text="This streams in real-time with very low latency.",
    speaker_wav="reference.wav",
    language="en"
)

for chunk in chunks:
    sd.play(np.array(chunk), samplerate=24000)
    sd.wait()
```

### Fine-Tuning

Train on your own voice data:

```python
from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
tts.fine_tune(
    dataset_path="my_voice_dataset/",
    output_path="my_finetuned_model/",
    num_epochs=10,
    batch_size=4,
)
```

### TTS Server

Run as a REST API:

```bash
tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --port 5002
```

```bash
# POST text, get audio
curl -X POST http://localhost:5002/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "language": "en"}' \
  --output speech.wav
```

---

## FAQ

**Q: What is Coqui TTS?**
A: Coqui TTS is the most popular open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages, voice cloning, and multiple architectures (XTTS v2, VITS, Bark, Tortoise) via a unified Python API.

**Q: Is Coqui TTS still maintained after the company shut down?**
A: The company closed in 2023, but the open-source library continues to be widely used and community-maintained. XTTS v2 remains one of the best open-source TTS models available.

**Q: Is Coqui TTS free?**
A: Yes, open-source under MPL-2.0 (Mozilla Public License). Free for commercial and non-commercial use.

---

## Source & Thanks

> Created by [Coqui AI](https://github.com/coqui-ai). Licensed under MPL-2.0.
>
> [TTS](https://github.com/coqui-ai/TTS) — ⭐ 44,900+

Thanks to the Coqui AI team and community for building the most comprehensive open-source TTS toolkit.

---

<!-- ZH -->

## 快速使用

```bash
pip install TTS
```

```bash
# 生成英文语音
tts --text "Hello, welcome to TokRepo." --out_path output.wav

# XTTS v2 中文语音 + 声音克隆
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
    --text "你好，欢迎来到TokRepo。" \
    --speaker_wav reference_voice.wav \
    --language_idx zh-cn \
    --out_path output_zh.wav
```

---

## 简介

Coqui TTS 是最全面的开源语音合成库，拥有 44,900+ GitHub stars，支持 1,100+ 语言。旗舰 XTTS v2 模型仅需 6 秒参考音频即可实现声音克隆，流式延迟低于 200ms。实现了所有主流 TTS 架构（VITS、Tacotron 2、Bark、Tortoise），提供统一的 Python API 和 CLI。

适用于：Python、CUDA GPU、任何需要语音合成的应用。适合为 AI 代理、聊天机器人、无障碍工具或内容创作管线添加语音的开发者。

---

## 核心功能

### XTTS v2 旗舰模型
支持 16 种语言，6 秒参考音频克隆声音，流式延迟低于 200ms。

### 丰富的模型库
VITS（超快）、YourTTS（多说话人）、Bark（富表现力）、Tortoise（最高质量）。

### 流式合成
实时流式输出音频块，适合对话场景。

### 微调训练
在自己的语音数据上微调模型，打造专属声音。

### REST API 服务
一行命令启动 TTS 服务器，HTTP 接口生成语音。

---

## 来源与感谢

> Created by [Coqui AI](https://github.com/coqui-ai). Licensed under MPL-2.0.
>
> [TTS](https://github.com/coqui-ai/TTS) — ⭐ 44,900+


---
Source: https://tokrepo.com/en/workflows/a059dce2-6275-4ea0-a57b-e885248d8e95
Author: TokRepo精选