# Dia — Realistic Dialogue Text-to-Speech Model

> Dia is a 1.6B parameter TTS model by Nari Labs that generates realistic dialogue audio from transcripts. 19.2K+ GitHub stars. Supports multi-speaker dialogue, non-verbal sounds, and voice cloning. Apa

## Install

Save as a script file and run:

## Quick Use

```bash
# Install
pip install git+https://github.com/nari-labs/dia.git

# Generate dialogue audio
python -c "
from dia.model import Dia
model = Dia.from_pretrained('nari-labs/Dia-1.6B')
text = '[S1] Hey, have you tried Dia yet? [S2] (laughs) Yeah, it sounds incredibly natural!'
output = model.generate(text)
model.save_audio('dialogue.wav', output)
print('Saved dialogue.wav')
"
```

Requires GPU with PyTorch 2.0+ and CUDA 12.6. On RTX 4090: 2.1x real-time, ~4.4GB VRAM.

---

## Intro

Dia is a 1.6 billion parameter text-to-speech model by Nari Labs that directly generates highly realistic dialogue audio from transcripts in a single pass. With 19,200+ GitHub stars and Apache 2.0 license, Dia supports multi-speaker dialogue using `[S1]` and `[S2]` speaker tags, non-verbal sound generation (laughter, coughing, throat-clearing), and voice cloning through audio conditioning for emotion and tone control. It achieves 2.1x real-time speed on an RTX 4090 with just 4.4GB VRAM.

**Best for**: Developers building conversational AI, podcast generation, audiobook creation, or voice interfaces
**Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf
**Requirements**: GPU with PyTorch 2.0+, CUDA 12.6, English language

---

## Key Features

- **Multi-speaker dialogue**: Use `[S1]` and `[S2]` tags to generate natural conversations
- **Non-verbal sounds**: Laughter, coughing, sighing, throat-clearing built in
- **Voice cloning**: Condition on reference audio to match emotion and tone
- **Single-pass generation**: No multi-step pipeline, generates audio directly from text
- **Fast inference**: 2.1x real-time on RTX 4090, 4.4GB VRAM (bfloat16 with compilation)
- **1.6B parameters**: Large enough for quality, small enough to run locally

---

### FAQ

**Q: What is Dia?**
A: Dia is a 1.6B parameter text-to-speech model with 19.2K+ stars that generates realistic multi-speaker dialogue audio from transcripts. It supports non-verbal sounds and voice cloning. Apache 2.0 licensed by Nari Labs.

**Q: How do I install Dia?**
A: Run `pip install git+https://github.com/nari-labs/dia.git`. Requires a GPU with PyTorch 2.0+ and CUDA 12.6.

**Q: What languages does Dia support?**
A: Currently English only. The model generates dialogue audio with natural prosody, pauses, and non-verbal sounds.

---

## Source & Thanks

> Created by [Nari Labs](https://github.com/nari-labs). Licensed under Apache 2.0.
> [nari-labs/dia](https://github.com/nari-labs/dia) — 19,200+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/86148916-edf9-4ed9-8348-205c9b535810
Author: Script Depot