Is ChatTTS — Expressive Text-to-Speech for Dialogue free to use?

Yes. ChatTTS — Expressive Text-to-Speech for Dialogue is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install ChatTTS — Expressive Text-to-Speech for Dialogue?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 2, 2026·2 min read

ChatTTS — Expressive Text-to-Speech for Dialogue

Generate natural conversational speech with laughter, pauses, and emotion. Optimized for dialogue scenarios. 39K+ GitHub stars.

Script Depot · Community

TL;DR

ChatTTS generates conversational speech with laughter, pauses, and emotion.

§01

What it is

ChatTTS is an open-source text-to-speech model designed specifically for dialogue scenarios. Unlike standard TTS systems that produce flat, robotic speech, ChatTTS generates expressive audio with laughter, pauses, interjections, and emotional variation. It supports both English and Chinese, and can produce speech that sounds like a natural conversation.

It targets developers building chatbots, voice assistants, podcast generators, and any application where AI-generated speech needs to sound human and conversational.

§02

How it saves time or tokens

ChatTTS eliminates the need for expensive commercial TTS APIs for conversational use cases. The model runs locally, so there are no per-character costs or API rate limits. It generates audio from text in seconds, and the expressive controls (laughter, pauses) are embedded via text tokens rather than requiring separate audio processing pipelines.

§03

How to use

Install and set up:

pip install ChatTTS

Generate speech:

import ChatTTS
import torchaudio

chat = ChatTTS.Chat()
chat.load(compile=False)  # Downloads model on first run

texts = ['Hey, have you tried this new AI tool? It is amazing.']
wavs = chat.infer(texts)
torchaudio.save('output.wav', wavs[0], 24000)

Add expressiveness with control tokens:

# Use special tokens for laughter, pauses, etc.
texts = ['So I tried to deploy it and [laugh] it actually worked on the first try.']
wavs = chat.infer(texts)
torchaudio.save('expressive.wav', wavs[0], 24000)

§04

Example

import ChatTTS
import torchaudio
import torch

chat = ChatTTS.Chat()
chat.load(compile=False)

# Generate a dialogue with different speakers
speaker_a = chat.sample_random_speaker()
speaker_b = chat.sample_random_speaker()

lines = [
    ('What do you think about using AI for code review?', speaker_a),
    ('Honestly? [laugh] It catches things I miss all the time.', speaker_b),
    ('Same here. The false positive rate is still annoying though.', speaker_a),
]

for text, speaker in lines:
    params = ChatTTS.Chat.InferCodeParams(spk_emb=speaker)
    wav = chat.infer([text], params_infer_code=params)
    torchaudio.save(f'line_{lines.index((text, speaker))}.wav', wav[0], 24000)

§05

Related on TokRepo

AI tools for voice -- Voice synthesis and recognition tools
AI tools for content -- Content creation and generation tools

§06

Common pitfalls

ChatTTS requires PyTorch and a GPU for fast inference. CPU inference works but is significantly slower. An NVIDIA GPU with at least 4GB VRAM is recommended.
The model downloads on first run (several GB). Ensure you have adequate disk space and bandwidth for the initial setup.
Audio quality varies by input text length and complexity. Very long texts should be split into sentences for best results.

Frequently Asked Questions

What languages does ChatTTS support?+

ChatTTS primarily supports English and Chinese. The model was trained on conversational data in both languages. Other languages may work with reduced quality but are not officially supported. Check the project repository for updates on language coverage.

Can I control the speaker voice?+

Yes. ChatTTS supports speaker embedding. You can sample random speakers with sample_random_speaker() or save and reuse specific speaker embeddings for consistent voice across sessions. This lets you create distinct character voices for dialogue generation.

Does ChatTTS require a GPU?+

A GPU is strongly recommended. ChatTTS uses PyTorch and runs best on NVIDIA GPUs with CUDA support. CPU inference is possible but 5-10x slower. For production use, a GPU with at least 4GB VRAM provides real-time or near-real-time generation.

How does ChatTTS compare to commercial TTS services?+

ChatTTS excels at conversational expressiveness -- laughter, pauses, and emotional variation -- which many commercial services handle poorly. Commercial services (ElevenLabs, Azure TTS) may offer higher raw audio quality and more voice options, but ChatTTS is free, runs locally, and has no API costs.

Can I use ChatTTS in a production application?+

Yes. ChatTTS can be integrated into production apps via its Python API. For high-throughput scenarios, run it as a microservice behind an API endpoint. Be mindful of licensing -- check the project repository for the current license terms before commercial deployment.

Citations (3)

ChatTTS GitHub Repository— ChatTTS is an open-source expressive TTS model for dialogue
ChatTTS Documentation— ChatTTS supports control tokens for laughter and pauses
Neural Speech Synthesis Survey— Text-to-speech models benefit from training on conversational data for natural p…

Related on TokRepo

Voice tools Content tools Featured workflows

🙏

Source & Thanks

Created by 2noise. Licensed under AGPL-3.0.

ChatTTS — ⭐ 39,000+

Discussion

No comments yet. Be the first to share your thoughts.

ChatTTS — Expressive Text-to-Speech for Dialogue

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

NAPI-RS — Build Node.js Native Addons in Rust

Mamba — Fast Cross-Platform Package Manager

Plasmo — The Browser Extension Framework