ChatTTS — Expressive Text-to-Speech for Dialogue
Generate natural conversational speech with laughter, pauses, and emotion. Optimized for dialogue scenarios. 39K+ GitHub stars.
What it is
ChatTTS is an open-source text-to-speech model designed specifically for dialogue scenarios. Unlike standard TTS systems that produce flat, robotic speech, ChatTTS generates expressive audio with laughter, pauses, interjections, and emotional variation. It supports both English and Chinese, and can produce speech that sounds like a natural conversation.
It targets developers building chatbots, voice assistants, podcast generators, and any application where AI-generated speech needs to sound human and conversational.
How it saves time or tokens
ChatTTS eliminates the need for expensive commercial TTS APIs for conversational use cases. The model runs locally, so there are no per-character costs or API rate limits. It generates audio from text in seconds, and the expressive controls (laughter, pauses) are embedded via text tokens rather than requiring separate audio processing pipelines.
How to use
- Install and set up:
pip install ChatTTS
- Generate speech:
import ChatTTS
import torchaudio
chat = ChatTTS.Chat()
chat.load(compile=False) # Downloads model on first run
texts = ['Hey, have you tried this new AI tool? It is amazing.']
wavs = chat.infer(texts)
torchaudio.save('output.wav', wavs[0], 24000)
- Add expressiveness with control tokens:
# Use special tokens for laughter, pauses, etc.
texts = ['So I tried to deploy it and [laugh] it actually worked on the first try.']
wavs = chat.infer(texts)
torchaudio.save('expressive.wav', wavs[0], 24000)
Example
import ChatTTS
import torchaudio
import torch
chat = ChatTTS.Chat()
chat.load(compile=False)
# Generate a dialogue with different speakers
speaker_a = chat.sample_random_speaker()
speaker_b = chat.sample_random_speaker()
lines = [
('What do you think about using AI for code review?', speaker_a),
('Honestly? [laugh] It catches things I miss all the time.', speaker_b),
('Same here. The false positive rate is still annoying though.', speaker_a),
]
for text, speaker in lines:
params = ChatTTS.Chat.InferCodeParams(spk_emb=speaker)
wav = chat.infer([text], params_infer_code=params)
torchaudio.save(f'line_{lines.index((text, speaker))}.wav', wav[0], 24000)
Related on TokRepo
- AI tools for voice -- Voice synthesis and recognition tools
- AI tools for content -- Content creation and generation tools
Common pitfalls
- ChatTTS requires PyTorch and a GPU for fast inference. CPU inference works but is significantly slower. An NVIDIA GPU with at least 4GB VRAM is recommended.
- The model downloads on first run (several GB). Ensure you have adequate disk space and bandwidth for the initial setup.
- Audio quality varies by input text length and complexity. Very long texts should be split into sentences for best results.
Frequently Asked Questions
ChatTTS primarily supports English and Chinese. The model was trained on conversational data in both languages. Other languages may work with reduced quality but are not officially supported. Check the project repository for updates on language coverage.
Yes. ChatTTS supports speaker embedding. You can sample random speakers with sample_random_speaker() or save and reuse specific speaker embeddings for consistent voice across sessions. This lets you create distinct character voices for dialogue generation.
A GPU is strongly recommended. ChatTTS uses PyTorch and runs best on NVIDIA GPUs with CUDA support. CPU inference is possible but 5-10x slower. For production use, a GPU with at least 4GB VRAM provides real-time or near-real-time generation.
ChatTTS excels at conversational expressiveness -- laughter, pauses, and emotional variation -- which many commercial services handle poorly. Commercial services (ElevenLabs, Azure TTS) may offer higher raw audio quality and more voice options, but ChatTTS is free, runs locally, and has no API costs.
Yes. ChatTTS can be integrated into production apps via its Python API. For high-throughput scenarios, run it as a microservice behind an API endpoint. Be mindful of licensing -- check the project repository for the current license terms before commercial deployment.
Citations (3)
- ChatTTS GitHub Repository— ChatTTS is an open-source expressive TTS model for dialogue
- ChatTTS Documentation— ChatTTS supports control tokens for laughter and pauses
- Neural Speech Synthesis Survey— Text-to-speech models benefit from training on conversational data for natural p…
Related on TokRepo
Source & Thanks
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.