Is Dia — Realistic Dialogue Text-to-Speech Model free to use?

Yes. Dia — Realistic Dialogue Text-to-Speech Model is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Dia — Realistic Dialogue Text-to-Speech Model?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsMar 31, 2026·2 min read

Dia — Realistic Dialogue Text-to-Speech Model

Dia is a 1.6B parameter TTS model by Nari Labs that generates realistic dialogue audio from transcripts. 19.2K+ GitHub stars. Supports multi-speaker dialogue, non-verbal sounds, and voice cloning. Apa

Script Depot · Community

TL;DR

Dia generates realistic dialogue audio from transcripts with multi-speaker and voice cloning support.

§01

What it is

Dia is a 1.6B parameter text-to-speech model built by Nari Labs. It generates realistic dialogue audio from text transcripts, supporting multiple speakers in a single generation, non-verbal sounds like laughter and sighs, and voice cloning from reference audio.

Dia targets podcast producers, content creators, and developers building conversational AI interfaces who need natural-sounding multi-speaker audio without recording studios.

§02

How it saves time or tokens

Traditional multi-speaker TTS requires generating each speaker separately and splicing audio. Dia produces a complete multi-speaker conversation in a single pass. The transcript format uses speaker tags like [S1] and [S2], so you write one script and get one audio file with distinct voices.

Voice cloning with a short reference clip eliminates the need for voice actor sessions for consistent character voices.

§03

How to use

Install Dia: pip install dia-tts
Prepare a transcript with speaker tags and optional non-verbal annotations
Run generation with the CLI or Python API
For voice cloning, provide a reference audio file for each speaker

§04

Example

from dia import Dia

model = Dia('nari-labs/dia-1.6b')

transcript = '''
[S1] Have you tried the new model?
[S2] (laughs) Yeah, it is surprisingly good.
[S1] Right? The voice quality is way better than I expected.
[S2] I might use it for my podcast intros.
'''

audio = model.generate(
    transcript,
    output_path='dialogue.wav',
    sample_rate=44100
)

The output is a single WAV file with two distinct speaker voices and the laugh rendered naturally.

§05

Related on TokRepo

Voice tools -- Text-to-speech and voice AI tools
Content tools -- Content creation and production tools

§06

Common pitfalls

Voice cloning quality degrades with noisy or short reference clips; use clean audio of at least 10 seconds
Non-verbal sound tags must match the model's vocabulary; unsupported tags are silently ignored
Running the 1.6B model requires a GPU with at least 6 GB VRAM; CPU inference is possible but very slow

Frequently Asked Questions

How many speakers can Dia handle in one generation?+

Dia supports multi-speaker dialogue with distinct speaker tags. The typical use case is two speakers, but the model can handle additional speakers with diminishing voice distinction. For best results, stick to two or three speakers per generation.

Does Dia require a GPU?+

A GPU with at least 6 GB VRAM is recommended for real-time generation. The 1.6B parameter model runs on consumer GPUs like the RTX 3060 or above. CPU inference works but is significantly slower, making it impractical for long dialogues.

How does voice cloning work in Dia?+

Provide a reference audio clip (at least 10 seconds of clean speech) for each speaker. Dia extracts voice characteristics and applies them during generation. The cloned voice maintains the speaking style and timbre of the reference across the entire dialogue.

What audio formats does Dia output?+

Dia outputs WAV files by default. You can configure the sample rate (16 kHz, 22 kHz, or 44.1 kHz). For other formats like MP3 or FLAC, post-process the WAV output with ffmpeg or a Python audio library.

Is Dia open source?+

Yes. Dia is released under the Apache 2.0 license. The model weights and code are available on GitHub. You can fine-tune the model on your own data for domain-specific voice quality.

Citations (3)

Dia GitHub— Dia is a 1.6B parameter TTS model by Nari Labs with 19.2K+ GitHub stars
Dia License— Apache 2.0 license for open-source TTS model
arXiv— Text-to-speech synthesis using neural network architectures

Related on TokRepo

Voice tools Content tools Featured workflows

🙏

Source & Thanks

Created by Nari Labs. Licensed under Apache 2.0. nari-labs/dia — 19,200+ GitHub stars

Discussion

No comments yet. Be the first to share your thoughts.

Dia — Realistic Dialogue Text-to-Speech Model

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

Flax — Neural Network Library for JAX

PyCaret — Low-Code Machine Learning in Python

DGL — Deep Graph Library for Scalable Graph Neural Networks