What is Coqui TTS — Deep Learning Text-to-Speech Engine?

Generate speech in 1100+ languages with voice cloning. XTTS v2 streams with under 200ms latency. 44K+ GitHub stars.

Is Coqui TTS — Deep Learning Text-to-Speech Engine free to use?

Yes. Coqui TTS — Deep Learning Text-to-Speech Engine is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Coqui TTS — Deep Learning Text-to-Speech Engine?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Coqui TTS — Deep Learning Text-to-Speech Engine

Coqui TTS is the most comprehensive open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages via pretrained models. Its flagship XTTS v2 model delivers production-quality multilingual speech with voice cloning in just 6 seconds of reference audio and under 200ms streaming latency. The library implements every major TTS architecture — VITS, Tacotron 2, Glow-TTS, Bark, Tortoise — with a unified Python API and CLI. While Coqui the company closed in 2023, the open-source project remains the go-to TTS toolkit for developers worldwide.

Works with: Python, CUDA GPUs, CPU (slower), any application via CLI or Python API. Best for developers adding voice to AI agents, chatbots, accessibility tools, or content creation pipelines. Setup time: under 3 minutes.

Coqui TTS Model Zoo & Features

Model Architectures

Model	Type	Quality	Speed	Voice Clone
XTTS v2	End-to-end	★★★★★	Fast (GPU)	✅ 6s reference
VITS	End-to-end	★★★★	Very fast	❌
YourTTS	Multi-speaker	★★★★	Fast	✅
Bark	Generative	★★★★	Slow	❌ (but expressive)
Tortoise	Diffusion	★★★★★	Very slow	✅
Tacotron 2	Spectrogram	★★★	Medium	❌
Glow-TTS	Flow-based	★★★	Fast	❌

XTTS v2 — Flagship Model

The recommended model for most use cases:

from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# 16 supported languages
languages = ["en", "es", "fr", "de", "it", "pt", "pl", "tr",
             "ru", "nl", "cs", "ar", "zh-cn", "ja", "hu", "ko"]

# Voice cloning from 6-second reference
tts.tts_to_file(
    text="This is my cloned voice speaking.",
    speaker_wav="reference.wav",  # Just 6 seconds needed
    language="en",
    file_path="cloned_output.wav"
)

Features:

16 languages with natural prosody
Voice cloning from just 6 seconds of reference audio
Streaming with under 200ms latency
Emotion preservation from reference audio

Streaming TTS

from TTS.api import TTS
import sounddevice as sd
import numpy as np

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# Stream audio chunks in real-time
chunks = tts.tts_stream(
    text="This streams in real-time with very low latency.",
    speaker_wav="reference.wav",
    language="en"
)

for chunk in chunks:
    sd.play(np.array(chunk), samplerate=24000)
    sd.wait()

Fine-Tuning

Train on your own voice data:

from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
tts.fine_tune(
    dataset_path="my_voice_dataset/",
    output_path="my_finetuned_model/",
    num_epochs=10,
    batch_size=4,
)

TTS Server

Run as a REST API:

tts-server --model_name tts_models/multilingual/multi-dataset/xtts_v2 --port 5002

# POST text, get audio
curl -X POST http://localhost:5002/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "language": "en"}' \
  --output speech.wav

FAQ

Q: What is Coqui TTS? A: Coqui TTS is the most popular open-source text-to-speech library with 44,900+ GitHub stars, supporting 1,100+ languages, voice cloning, and multiple architectures (XTTS v2, VITS, Bark, Tortoise) via a unified Python API.

Q: Is Coqui TTS still maintained after the company shut down? A: The company closed in 2023, but the open-source library continues to be widely used and community-maintained. XTTS v2 remains one of the best open-source TTS models available.

Q: Is Coqui TTS free? A: Yes, open-source under MPL-2.0 (Mozilla Public License). Free for commercial and non-commercial use.

Coqui TTS — Deep Learning Text-to-Speech Engine

Coqui TTS Model Zoo & Features

Model Architectures

XTTS v2 — Flagship Model

Streaming TTS

Fine-Tuning

TTS Server

FAQ

Fuente y agradecimientos

Discusión

Activos relacionados

Unkey — Open-Source API Key Management Platform

Flagsmith — Open-Source Feature Flags and Remote Config

OpenStatus — Open-Source Monitoring and Status Page Platform