# Deepgram Aura TTS — Text-to-Speech for Voice Agents

> Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

## Install

Save as a script file and run:

## Quick Use

1. POST `/v1/speak?model=aura-2-luna-en` with JSON `{text}` for batch
2. WebSocket `dg.speak.websocket.v('1')` for streaming voice agents
3. Pair with Deepgram STT or use Voice Agent API for full stack

---

## Intro

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.

---

### Single audio buffer (REST)

```python
import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)
```

### Streaming WebSocket

```python
import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())
```

### Voice catalog (Aura 2)

| Voice ID | Description |
|---|---|
| `aura-2-luna-en` | Warm American female, default |
| `aura-2-stella-en` | Bright American female, podcast energy |
| `aura-2-orion-en` | Deep American male, authoritative |
| `aura-2-arcas-en` | Mid-30s American male, conversational |
| `aura-2-asteria-en` | Calm British female |
| `aura-2-hera-en` | Professional American female, customer-service |
| `aura-2-helios-en` | Warm British male |
| `aura-2-perseus-en` | American male, neutral |

Spanish, French, German, and Portuguese voices added throughout 2026 — check `developers.deepgram.com/docs/text-to-speech` for current language list.

### Aura vs ElevenLabs vs Cartesia

| Quality | Aura | ElevenLabs | Cartesia |
|---|---|---|---|
| Time to first audio | ~250ms | ~280ms | ~75ms |
| English naturalness | High | Highest | High |
| Long-form narration | Fair | Excellent | Good |
| Conversational fit | Excellent | Excellent | Excellent |
| Languages | EN (more in 2026) | 32 | 15 |
| Per-minute cost | $0.015 | $0.015-0.18 | $0.025/1k chars |

### Pricing

- Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
- Free tier: $200 credit at signup
- Voice Agent API bundles STT+LLM+TTS at one per-minute rate

---

### FAQ

**Q: Why use Aura over ElevenLabs?**
A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

**Q: Does Aura support SSML?**
A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

**Q: Voice cloning?**
A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.

---

## Source & Thanks

> Built by [Deepgram](https://github.com/deepgram). Aura TTS docs at [developers.deepgram.com/docs/text-to-speech](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk)

---

<!-- ZH -->

## 快速使用

1. 批量 POST `/v1/speak?model=aura-2-luna-en` 带 JSON `{text}`
2. 流式语音 agent 用 WebSocket `dg.speak.websocket.v('1')`
3. 配 Deepgram STT 或用 Voice Agent API 整套栈

---

## 简介

Aura 是 Deepgram 的 TTS —— 专为对话语音 agent 而非长篇旁白设计。首音频 250ms、12 个为自然轮转调过的英语嗓音、流式 WebSocket 和 REST API。跟 Deepgram STT 原生配对，低摩擦单厂商语音栈。适合客服语音 agent、IVR 替代、「电话上听着像真人」比「有声书质量」更重要的语音 copilot。兼容 Deepgram SDK、REST、WebSocket、Voice Agent API。装机时间 5 分钟。

---

### 单音频 buffer（REST）

```python
import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)
```

### 流式 WebSocket

```python
import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())
```

### 嗓音目录（Aura 2）

| 嗓音 ID | 描述 |
|---|---|
| `aura-2-luna-en` | 温暖美式女声，默认 |
| `aura-2-stella-en` | 明亮美式女声，播客活力 |
| `aura-2-orion-en` | 低沉美式男声，权威感 |
| `aura-2-arcas-en` | 30 出头美式男声，对话感 |
| `aura-2-asteria-en` | 平静英式女声 |
| `aura-2-hera-en` | 专业美式女声，客服 |
| `aura-2-helios-en` | 温暖英式男声 |
| `aura-2-perseus-en` | 美式男声，中性 |

西班牙语、法语、德语、葡语 2026 年陆续加入 —— 看 `developers.deepgram.com/docs/text-to-speech` 拿当前语言列表。

### Aura vs ElevenLabs vs Cartesia

| 维度 | Aura | ElevenLabs | Cartesia |
|---|---|---|---|
| 首音频时间 | ~250ms | ~280ms | ~75ms |
| 英语自然度 | 高 | 最高 | 高 |
| 长篇旁白 | 一般 | 极佳 | 好 |
| 对话契合度 | 极佳 | 极佳 | 极佳 |
| 语言 | EN（2026 更多）| 32 | 15 |
| 每分钟成本 | $0.015 | $0.015-0.18 | $0.025/千字 |

### 价格

- Aura TTS：等效 $0.015/分钟（约 $0.030/千字符）
- 免费档：注册赠 $200 credit
- Voice Agent API 把 STT+LLM+TTS 打包按统一分钟费率

---

### FAQ

**Q: 为啥选 Aura 不选 ElevenLabs？**
A: 跟 Deepgram STT 配是单厂商（一张账单、一份 SLA）。TTFA 比 ElevenLabs Turbo 快。嗓音库更小 —— 角色嗓音多样性或 32 种语言覆盖选 ElevenLabs。

**Q: Aura 支持 SSML 吗？**
A: 有限支持 —— 停顿、强调、基础韵律。完整 SSML 比如 phoneme 标签没有。复杂韵律控制 ElevenLabs 或 Cartesia 标记更丰富。

**Q: 嗓音克隆？**
A: Aura 还没有 —— 嗓音是策划过的。ElevenLabs 和 Cartesia 都支持克隆。品牌定制嗓音关键就选那俩。catalog 嗓音够用的话 Aura 质量 + 延迟在 agent 场景赢。

---

## 来源与感谢

> Built by [Deepgram](https://github.com/deepgram). Aura TTS docs at [developers.deepgram.com/docs/text-to-speech](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk)


---
Source: https://tokrepo.com/en/workflows/deepgram-aura-tts-text-to-speech-for-voice-agents
Author: Deepgram