# Coqui TTS — Deep Learning Text-to-Speech Engine > Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars. ## Install Save as a script file and run: # Coqui TTS — Deep Learning Text-to-Speech Engine ## Quick Use ```bash pip install TTS ``` ```bash # List available models tts --list_models # Generate speech from text (English) tts --text "Hello, welcome to TokRepo." --out_path output.wav # Use XTTS v2 for multilingual + voice cloning tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \ --text "你好,欢迎来到TokRepo。" \ --speaker_wav reference_voice.wav \ --language_idx zh-cn \ --out_path output_zh.wav ``` ```python from TTS.api import TTS # Initialize XTTS v2 tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # Generate speech with voice cloning tts.tts_to_file( text="Welcome to the future of AI voice.", speaker_wav="my_voice.wav", language="en", file_path="output.wav" ) ``` --- ## Intro Coqui TTS is the most comprehensive open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages via pretrained models. Its flagship XTTS v2 model delivers production-quality multilingual speech with voice cloning in just 6 seconds of reference audio and under 200ms streaming latency. The library implements every major TTS architecture — VITS, Tacotron 2, Glow-TTS, Bark, Tortoise — with a unified Python API and CLI. While Coqui the company closed in 2023, the open-source project remains the go-to TTS toolkit for developers worldwide. Works with: Python, CUDA GPUs, CPU (slower), any application via CLI or Python API. Best for developers adding voice to AI agents, chatbots, accessibility tools, or content creation pipelines. Setup time: under 3 minutes. --- ## Coqui TTS Model Zoo & Features ### Model Architectures | Model | Type | Quality | Speed | Voice Clone | |-------|------|---------|-------|-------------| | **XTTS v2** | End-to-end | ★★★★★ | Fast (GPU) | ✅ 6s reference | | **VITS** | End-to-end | ★★★★ | Very fast | ❌ | | **YourTTS** | Multi-speaker | ★★★★ | Fast | ✅ | | **Bark** | Generative | ★★★★ | Slow | ❌ (but expressive) | | **Tortoise** | Diffusion | ★★★★★ | Very slow | ✅ | | **Tacotron 2** | Spectrogram | ★★★ | Medium | ❌ | | **Glow-TTS** | Flow-based | ★★★ | Fast | ❌ | ### XTTS v2 — Flagship Model The recommended model for most use cases: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # 16 supported languages languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko"] # Voice cloning from 6-second reference tts.tts_to_file( text="This is my cloned voice speaking.", speaker_wav="reference.wav", # Just 6 seconds needed language="en", file_path="cloned_output.wav" ) ``` Features: - **16 languages** with natural prosody - **Voice cloning** from just 6 seconds of reference audio - **Streaming** with under 200ms latency - **Emotion preservation** from reference audio ### Streaming TTS ```python from TTS.api import TTS import sounddevice as sd import numpy as np tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda") # Stream audio chunks in real-time chunks = tts.tts_stream( text="This streams in real-time with very low latency.", speaker_wav="reference.wav", language="en" ) for chunk in chunks: sd.play(np.array(chunk), samplerate=24000) sd.wait() ``` ### Fine-Tuning Train on your own voice data: ```python from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2") tts.fine_tune( dataset_path="my_voice_dataset/", output_path="my_finetuned_model/", num_epochs=10, batch_size=4, ) ``` ### TTS Server Run as a REST API: ```bash tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --port 5002 ``` ```bash # POST text, get audio curl -X POST http://localhost:5002/api/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "language": "en"}' \ --output speech.wav ``` --- ## FAQ **Q: What is Coqui TTS?** A: Coqui TTS is the most popular open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages, voice cloning, and multiple architectures (XTTS v2, VITS, Bark, Tortoise) via a unified Python API. **Q: Is Coqui TTS still maintained after the company shut down?** A: The company closed in 2023, but the open-source library continues to be widely used and community-maintained. XTTS v2 remains one of the best open-source TTS models available. **Q: Is Coqui TTS free?** A: Yes, open-source under MPL-2.0 (Mozilla Public License). Free for commercial and non-commercial use. --- ## Source & Thanks > Created by [Coqui AI](https://github.com/coqui-ai). Licensed under MPL-2.0. > > [TTS](https://github.com/coqui-ai/TTS) — ⭐ 44,900+ Thanks to the Coqui AI team and community for building the most comprehensive open-source TTS toolkit. --- ## 快速使用 ```bash pip install TTS ``` ```bash # 生成英文语音 tts --text "Hello, welcome to TokRepo." --out_path output.wav # XTTS v2 中文语音 + 声音克隆 tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \ --text "你好,欢迎来到TokRepo。" \ --speaker_wav reference_voice.wav \ --language_idx zh-cn \ --out_path output_zh.wav ``` --- ## 简介 Coqui TTS 是最全面的开源语音合成库,拥有 44,900+ GitHub stars,支持 1,100+ 语言。旗舰 XTTS v2 模型仅需 6 秒参考音频即可实现声音克隆,流式延迟低于 200ms。实现了所有主流 TTS 架构(VITS、Tacotron 2、Bark、Tortoise),提供统一的 Python API 和 CLI。 适用于:Python、CUDA GPU、任何需要语音合成的应用。适合为 AI 代理、聊天机器人、无障碍工具或内容创作管线添加语音的开发者。 --- ## 核心功能 ### XTTS v2 旗舰模型 支持 16 种语言,6 秒参考音频克隆声音,流式延迟低于 200ms。 ### 丰富的模型库 VITS(超快)、YourTTS(多说话人)、Bark(富表现力)、Tortoise(最高质量)。 ### 流式合成 实时流式输出音频块,适合对话场景。 ### 微调训练 在自己的语音数据上微调模型,打造专属声音。 ### REST API 服务 一行命令启动 TTS 服务器,HTTP 接口生成语音。 --- ## 来源与感谢 > Created by [Coqui AI](https://github.com/coqui-ai). Licensed under MPL-2.0. > > [TTS](https://github.com/coqui-ai/TTS) — ⭐ 44,900+ --- Source: https://tokrepo.com/en/workflows/a059dce2-6275-4ea0-a57b-e885248d8e95 Author: TokRepo精选