ScriptsApr 2, 2026·2 min read

ChatTTS — Expressive Text-to-Speech for Dialogue

Generate natural conversational speech with laughter, pauses, and emotion. Optimized for dialogue scenarios. 39K+ GitHub stars.

TL;DR
ChatTTS generates conversational speech with laughter, pauses, and emotion.
§01

What it is

ChatTTS is an open-source text-to-speech model designed specifically for dialogue scenarios. Unlike standard TTS systems that produce flat, robotic speech, ChatTTS generates expressive audio with laughter, pauses, interjections, and emotional variation. It supports both English and Chinese, and can produce speech that sounds like a natural conversation.

It targets developers building chatbots, voice assistants, podcast generators, and any application where AI-generated speech needs to sound human and conversational.

§02

How it saves time or tokens

ChatTTS eliminates the need for expensive commercial TTS APIs for conversational use cases. The model runs locally, so there are no per-character costs or API rate limits. It generates audio from text in seconds, and the expressive controls (laughter, pauses) are embedded via text tokens rather than requiring separate audio processing pipelines.

§03

How to use

  1. Install and set up:
pip install ChatTTS
  1. Generate speech:
import ChatTTS
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False)  # Downloads model on first run

texts = ['Hey, have you tried this new AI tool? It is amazing.']
wavs = chat.infer(texts)
torchaudio.save('output.wav', wavs[0], 24000)
  1. Add expressiveness with control tokens:
# Use special tokens for laughter, pauses, etc.
texts = ['So I tried to deploy it and [laugh] it actually worked on the first try.']
wavs = chat.infer(texts)
torchaudio.save('expressive.wav', wavs[0], 24000)
§04

Example

import ChatTTS
import torchaudio
import torch

chat = ChatTTS.Chat()
chat.load(compile=False)

# Generate a dialogue with different speakers
speaker_a = chat.sample_random_speaker()
speaker_b = chat.sample_random_speaker()

lines = [
    ('What do you think about using AI for code review?', speaker_a),
    ('Honestly? [laugh] It catches things I miss all the time.', speaker_b),
    ('Same here. The false positive rate is still annoying though.', speaker_a),
]

for text, speaker in lines:
    params = ChatTTS.Chat.InferCodeParams(spk_emb=speaker)
    wav = chat.infer([text], params_infer_code=params)
    torchaudio.save(f'line_{lines.index((text, speaker))}.wav', wav[0], 24000)
§05

Related on TokRepo

§06

Common pitfalls

  • ChatTTS requires PyTorch and a GPU for fast inference. CPU inference works but is significantly slower. An NVIDIA GPU with at least 4GB VRAM is recommended.
  • The model downloads on first run (several GB). Ensure you have adequate disk space and bandwidth for the initial setup.
  • Audio quality varies by input text length and complexity. Very long texts should be split into sentences for best results.

Frequently Asked Questions

What languages does ChatTTS support?+

ChatTTS primarily supports English and Chinese. The model was trained on conversational data in both languages. Other languages may work with reduced quality but are not officially supported. Check the project repository for updates on language coverage.

Can I control the speaker voice?+

Yes. ChatTTS supports speaker embedding. You can sample random speakers with sample_random_speaker() or save and reuse specific speaker embeddings for consistent voice across sessions. This lets you create distinct character voices for dialogue generation.

Does ChatTTS require a GPU?+

A GPU is strongly recommended. ChatTTS uses PyTorch and runs best on NVIDIA GPUs with CUDA support. CPU inference is possible but 5-10x slower. For production use, a GPU with at least 4GB VRAM provides real-time or near-real-time generation.

How does ChatTTS compare to commercial TTS services?+

ChatTTS excels at conversational expressiveness -- laughter, pauses, and emotional variation -- which many commercial services handle poorly. Commercial services (ElevenLabs, Azure TTS) may offer higher raw audio quality and more voice options, but ChatTTS is free, runs locally, and has no API costs.

Can I use ChatTTS in a production application?+

Yes. ChatTTS can be integrated into production apps via its Python API. For high-throughput scenarios, run it as a microservice behind an API endpoint. Be mindful of licensing -- check the project repository for the current license terms before commercial deployment.

Citations (3)
🙏

Source & Thanks

Created by 2noise. Licensed under AGPL-3.0.

ChatTTS — ⭐ 39,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets