ScriptsMar 31, 2026·2 min read

Dia — Realistic Dialogue Text-to-Speech Model

Dia is a 1.6B parameter TTS model by Nari Labs that generates realistic dialogue audio from transcripts. 19.2K+ GitHub stars. Supports multi-speaker dialogue, non-verbal sounds, and voice cloning. Apa

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install git+https://github.com/nari-labs/dia.git

# Generate dialogue audio
python -c "
from dia.model import Dia
model = Dia.from_pretrained('nari-labs/Dia-1.6B')
text = '[S1] Hey, have you tried Dia yet? [S2] (laughs) Yeah, it sounds incredibly natural!'
output = model.generate(text)
model.save_audio('dialogue.wav', output)
print('Saved dialogue.wav')
"

Requires GPU with PyTorch 2.0+ and CUDA 12.6. On RTX 4090: 2.1x real-time, ~4.4GB VRAM.


Intro

Dia is a 1.6 billion parameter text-to-speech model by Nari Labs that directly generates highly realistic dialogue audio from transcripts in a single pass. With 19,200+ GitHub stars and Apache 2.0 license, Dia supports multi-speaker dialogue using [S1] and [S2] speaker tags, non-verbal sound generation (laughter, coughing, throat-clearing), and voice cloning through audio conditioning for emotion and tone control. It achieves 2.1x real-time speed on an RTX 4090 with just 4.4GB VRAM.

Best for: Developers building conversational AI, podcast generation, audiobook creation, or voice interfaces Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Requirements: GPU with PyTorch 2.0+, CUDA 12.6, English language


Key Features

  • Multi-speaker dialogue: Use [S1] and [S2] tags to generate natural conversations
  • Non-verbal sounds: Laughter, coughing, sighing, throat-clearing built in
  • Voice cloning: Condition on reference audio to match emotion and tone
  • Single-pass generation: No multi-step pipeline, generates audio directly from text
  • Fast inference: 2.1x real-time on RTX 4090, 4.4GB VRAM (bfloat16 with compilation)
  • 1.6B parameters: Large enough for quality, small enough to run locally

FAQ

Q: What is Dia? A: Dia is a 1.6B parameter text-to-speech model with 19.2K+ stars that generates realistic multi-speaker dialogue audio from transcripts. It supports non-verbal sounds and voice cloning. Apache 2.0 licensed by Nari Labs.

Q: How do I install Dia? A: Run pip install git+https://github.com/nari-labs/dia.git. Requires a GPU with PyTorch 2.0+ and CUDA 12.6.

Q: What languages does Dia support? A: Currently English only. The model generates dialogue audio with natural prosody, pauses, and non-verbal sounds.


🙏

Source & Thanks

Created by Nari Labs. Licensed under Apache 2.0. nari-labs/dia — 19,200+ GitHub stars

Related Assets