What is Dia — Realistic Dialogue Text-to-Speech Model?

Dia is a 1.6B parameter TTS model by Nari Labs that generates realistic dialogue audio from transcripts. 19.2K+ GitHub stars. Supports multi-speaker dialogue, non-verbal sounds, and voice cloning. Apa

Is Dia — Realistic Dialogue Text-to-Speech Model free to use?

Yes. Dia — Realistic Dialogue Text-to-Speech Model is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Dia — Realistic Dialogue Text-to-Speech Model?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Dia — Realistic Dialogue Text-to-Speech Model

# Install pip install git+https://github.com/nari-labs/dia.git # Generate dialogue audio python -c " from dia.model import Dia model = Dia.from_pretrained('nari-labs/Dia-1.6B') text = '[S1] Hey, have you tried Dia yet? [S2] (laughs) Yeah, it sounds incredibly natural!' output = model.generate(text) model.save_audio('dialogue.wav', output) print('Saved dialogue.wav') "

Dia is a 1.6 billion parameter text-to-speech model by Nari Labs that directly generates highly realistic dialogue audio from transcripts in a single pass. With 19,200+ GitHub stars and Apache 2.0 license, Dia supports multi-speaker dialogue using [S1] and [S2] speaker tags, non-verbal sound generation (laughter, coughing, throat-clearing), and voice cloning through audio conditioning for emotion and tone control. It achieves 2.1x real-time speed on an RTX 4090 with just 4.4GB VRAM.

Best for: Developers building conversational AI, podcast generation, audiobook creation, or voice interfaces Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Requirements: GPU with PyTorch 2.0+, CUDA 12.6, English language

Key Features

Multi-speaker dialogue: Use [S1] and [S2] tags to generate natural conversations
Non-verbal sounds: Laughter, coughing, sighing, throat-clearing built in
Voice cloning: Condition on reference audio to match emotion and tone
Single-pass generation: No multi-step pipeline, generates audio directly from text
Fast inference: 2.1x real-time on RTX 4090, 4.4GB VRAM (bfloat16 with compilation)
1.6B parameters: Large enough for quality, small enough to run locally

FAQ

Q: What is Dia? A: Dia is a 1.6B parameter text-to-speech model with 19.2K+ stars that generates realistic multi-speaker dialogue audio from transcripts. It supports non-verbal sounds and voice cloning. Apache 2.0 licensed by Nari Labs.

Q: How do I install Dia? A: Run pip install git+https://github.com/nari-labs/dia.git. Requires a GPU with PyTorch 2.0+ and CUDA 12.6.

Q: What languages does Dia support? A: Currently English only. The model generates dialogue audio with natural prosody, pauses, and non-verbal sounds.

Dia — Realistic Dialogue Text-to-Speech Model

Use it first, then decide how deep to go

Key Features

FAQ

Source & Thanks

Related Assets

Kokoro — Lightweight 82M TTS in 9 Languages

GPT4All — Run LLMs Privately on Your Desktop

vLLM — High-Throughput LLM Serving Engine