ScriptsApr 2, 2026·3 min read

Coqui TTS — Deep Learning Text-to-Speech Engine

Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

```bash pip install TTS ``` ```bash # List available models tts --list_models # Generate speech from text (English) tts --text "Hello, welcome to TokRepo." --out_path output.wav # Use XTTS v2 for multilingual + voice cloning tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \ --text "你好,欢迎来到TokRepo。" \ --speaker_wav reference_voice.wav \ --language_idx zh-cn \ --out_path output_zh.wav ``` ```python from TTS.api import TTS # Initialize XTTS v2 tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # Generate speech with voice cloning tts.tts_to_file( text="Welcome to the future of AI voice.", speaker_wav="my_voice.wav", language="en", file_path="output.wav" ) ``` ---
Intro
Coqui TTS is the most comprehensive open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages via pretrained models. Its flagship XTTS v2 model delivers production-quality multilingual speech with voice cloning in just 6 seconds of reference audio and under 200ms streaming latency. The library implements every major TTS architecture — VITS, Tacotron 2, Glow-TTS, Bark, Tortoise — with a unified Python API and CLI. While Coqui the company closed in 2023, the open-source project remains the go-to TTS toolkit for developers worldwide. Works with: Python, CUDA GPUs, CPU (slower), any application via CLI or Python API. Best for developers adding voice to AI agents, chatbots, accessibility tools, or content creation pipelines. Setup time: under 3 minutes. ---
## Coqui TTS Model Zoo & Features ### Model Architectures | Model | Type | Quality | Speed | Voice Clone | |-------|------|---------|-------|-------------| | **XTTS v2** | End-to-end | ★★★★★ | Fast (GPU) | ✅ 6s reference | | **VITS** | End-to-end | ★★★★ | Very fast | ❌ | | **YourTTS** | Multi-speaker | ★★★★ | Fast | ✅ | | **Bark** | Generative | ★★★★ | Slow | ❌ (but expressive) | | **Tortoise** | Diffusion | ★★★★★ | Very slow | ✅ | | **Tacotron 2** | Spectrogram | ★★★ | Medium | ❌ | | **Glow-TTS** | Flow-based | ★★★ | Fast | ❌ | ### XTTS v2 — Flagship Model The recommended model for most use cases: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # 16 supported languages languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko"] # Voice cloning from 6-second reference tts.tts_to_file( text="This is my cloned voice speaking.", speaker_wav="reference.wav", # Just 6 seconds needed language="en", file_path="cloned_output.wav" ) ``` Features: - **16 languages** with natural prosody - **Voice cloning** from just 6 seconds of reference audio - **Streaming** with under 200ms latency - **Emotion preservation** from reference audio ### Streaming TTS ```python from TTS.api import TTS import sounddevice as sd import numpy as np tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # Stream audio chunks in real-time chunks = tts.tts_stream( text="This streams in real-time with very low latency.", speaker_wav="reference.wav", language="en" ) for chunk in chunks: sd.play(np.array(chunk), samplerate=24000) sd.wait() ``` ### Fine-Tuning Train on your own voice data: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2") tts.fine_tune( dataset_path="my_voice_dataset/", output_path="my_finetuned_model/", num_epochs=10, batch_size=4, ) ``` ### TTS Server Run as a REST API: ```bash tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --port 5002 ``` ```bash # POST text, get audio curl -X POST http://localhost:5002/api/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "language": "en"}' \ --output speech.wav ``` --- ## FAQ **Q: What is Coqui TTS?** A: Coqui TTS is the most popular open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages, voice cloning, and multiple architectures (XTTS v2, VITS, Bark, Tortoise) via a unified Python API. **Q: Is Coqui TTS still maintained after the company shut down?** A: The company closed in 2023, but the open-source library continues to be widely used and community-maintained. XTTS v2 remains one of the best open-source TTS models available. **Q: Is Coqui TTS free?** A: Yes, open-source under MPL-2.0 (Mozilla Public License). Free for commercial and non-commercial use. ---
🙏

Source & Thanks

> Created by [Coqui AI](https://github.com/coqui-ai). Licensed under MPL-2.0. > > [TTS](https://github.com/coqui-ai/TTS) — ⭐ 44,900+ Thanks to the Coqui AI team and community for building the most comprehensive open-source TTS toolkit.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets