What is Coqui TTS — Deep Learning Text-to-Speech Engine?

Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars.

Is Coqui TTS — Deep Learning Text-to-Speech Engine free to use?

Yes. Coqui TTS — Deep Learning Text-to-Speech Engine is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Coqui TTS — Deep Learning Text-to-Speech Engine?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Coqui TTS — Deep Learning Text-to-Speech Engine

Coqui TTS is the most comprehensive open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages via pretrained models. Its flagship XTTS v2 model delivers production-quality multilingual speech with voice cloning in just 6 seconds of reference audio and under 200ms streaming latency. The library implements every major TTS architecture — VITS, Tacotron 2, Glow-TTS, Bark, Tortoise — with a unified Python API and CLI. While Coqui the company closed in 2023, the open-source project remains the go-to TTS toolkit for developers worldwide. Works with: Python, CUDA GPUs, CPU (slower), any application via CLI or Python API. Best for developers adding voice to AI agents, chatbots, accessibility tools, or content creation pipelines. Setup time: under 3 minutes. ---

## Coqui TTS Model Zoo & Features ### Model Architectures | Model | Type | Quality | Speed | Voice Clone | |-------|------|---------|-------|-------------| | **XTTS v2** | End-to-end | ★★★★★ | Fast (GPU) | ✅ 6s reference | | **VITS** | End-to-end | ★★★★ | Very fast | ❌ | | **YourTTS** | Multi-speaker | ★★★★ | Fast | ✅ | | **Bark** | Generative | ★★★★ | Slow | ❌ (but expressive) | | **Tortoise** | Diffusion | ★★★★★ | Very slow | ✅ | | **Tacotron 2** | Spectrogram | ★★★ | Medium | ❌ | | **Glow-TTS** | Flow-based | ★★★ | Fast | ❌ | ### XTTS v2 — Flagship Model The recommended model for most use cases: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # 16 supported languages languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko"] # Voice cloning from 6-second reference tts.tts_to_file( text="This is my cloned voice speaking.", speaker_wav="reference.wav", # Just 6 seconds needed language="en", file_path="cloned_output.wav" ) ``` Features: - **16 languages** with natural prosody - **Voice cloning** from just 6 seconds of reference audio - **Streaming** with under 200ms latency - **Emotion preservation** from reference audio ### Streaming TTS ```python from TTS.api import TTS import sounddevice as sd import numpy as np tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # Stream audio chunks in real-time chunks = tts.tts_stream( text="This streams in real-time with very low latency.", speaker_wav="reference.wav", language="en" ) for chunk in chunks: sd.play(np.array(chunk), samplerate=24000) sd.wait() ``` ### Fine-Tuning Train on your own voice data: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2") tts.fine_tune( dataset_path="my_voice_dataset/", output_path="my_finetuned_model/", num_epochs=10, batch_size=4, ) ``` ### TTS Server Run as a REST API: ```bash tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --port 5002 ``` ```bash # POST text, get audio curl -X POST http://localhost:5002/api/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "language": "en"}' \ --output speech.wav ``` --- ## FAQ **Q: What is Coqui TTS?** A: Coqui TTS is the most popular open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages, voice cloning, and multiple architectures (XTTS v2, VITS, Bark, Tortoise) via a unified Python API. **Q: Is Coqui TTS still maintained after the company shut down?** A: The company closed in 2023, but the open-source library continues to be widely used and community-maintained. XTTS v2 remains one of the best open-source TTS models available. **Q: Is Coqui TTS free?** A: Yes, open-source under MPL-2.0 (Mozilla Public License). Free for commercial and non-commercial use. ---

Coqui TTS — Deep Learning Text-to-Speech Engine

Use it first, then decide how deep to go

Source & Thanks

Discussion

Related Assets

OpenLIT — OpenTelemetry LLM Observability

Agenta — Open-Source LLMOps Platform

Rerun — Visualize Multimodal AI Data in Real-Time