# Deepgram Aura TTS — Text-to-Speech for Voice Agents > Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration. ## Install Save as a script file and run: ## Quick Use 1. POST `/v1/speak?model=aura-2-luna-en` with JSON `{text}` for batch 2. WebSocket `dg.speak.websocket.v('1')` for streaming voice agents 3. Pair with Deepgram STT or use Voice Agent API for full stack --- ## Intro Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes. --- ### Single audio buffer (REST) ```python import requests resp = requests.post( "https://api.deepgram.com/v1/speak", headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"}, params={"model": "aura-2-luna-en", "encoding": "mp3"}, json={"text": "Welcome back to TokRepo. You have three new asset notifications."}, ) with open("welcome.mp3", "wb") as f: f.write(resp.content) ``` ### Streaming WebSocket ```python import asyncio from deepgram import DeepgramClient import sounddevice as sd import numpy as np dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"]) async def stream(): ws = dg.speak.websocket.v("1") await ws.start({ "model": "aura-2-luna-en", "encoding": "linear16", "sample_rate": 24000, }) ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False)) await ws.send_text("Hi there! How can I help you today?") await ws.flush() await ws.wait_for_complete() await ws.finish() asyncio.run(stream()) ``` ### Voice catalog (Aura 2) | Voice ID | Description | |---|---| | `aura-2-luna-en` | Warm American female, default | | `aura-2-stella-en` | Bright American female, podcast energy | | `aura-2-orion-en` | Deep American male, authoritative | | `aura-2-arcas-en` | Mid-30s American male, conversational | | `aura-2-asteria-en` | Calm British female | | `aura-2-hera-en` | Professional American female, customer-service | | `aura-2-helios-en` | Warm British male | | `aura-2-perseus-en` | American male, neutral | Spanish, French, German, and Portuguese voices added throughout 2026 — check `developers.deepgram.com/docs/text-to-speech` for current language list. ### Aura vs ElevenLabs vs Cartesia | Quality | Aura | ElevenLabs | Cartesia | |---|---|---|---| | Time to first audio | ~250ms | ~280ms | ~75ms | | English naturalness | High | Highest | High | | Long-form narration | Fair | Excellent | Good | | Conversational fit | Excellent | Excellent | Excellent | | Languages | EN (more in 2026) | 32 | 15 | | Per-minute cost | $0.015 | $0.015-0.18 | $0.025/1k chars | ### Pricing - Aura TTS: $0.015/min equivalent (~$0.030/1k characters) - Free tier: $200 credit at signup - Voice Agent API bundles STT+LLM+TTS at one per-minute rate --- ### FAQ **Q: Why use Aura over ElevenLabs?** A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage. **Q: Does Aura support SSML?** A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup. **Q: Voice cloning?** A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents. --- ## Source & Thanks > Built by [Deepgram](https://github.com/deepgram). Aura TTS docs at [developers.deepgram.com/docs/text-to-speech](https://developers.deepgram.com). > > [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) --- ## 快速使用 1. 批量 POST `/v1/speak?model=aura-2-luna-en` 带 JSON `{text}` 2. 流式语音 agent 用 WebSocket `dg.speak.websocket.v('1')` 3. 配 Deepgram STT 或用 Voice Agent API 整套栈 --- ## 简介 Aura 是 Deepgram 的 TTS —— 专为对话语音 agent 而非长篇旁白设计。首音频 250ms、12 个为自然轮转调过的英语嗓音、流式 WebSocket 和 REST API。跟 Deepgram STT 原生配对,低摩擦单厂商语音栈。适合客服语音 agent、IVR 替代、「电话上听着像真人」比「有声书质量」更重要的语音 copilot。兼容 Deepgram SDK、REST、WebSocket、Voice Agent API。装机时间 5 分钟。 --- ### 单音频 buffer(REST) ```python import requests resp = requests.post( "https://api.deepgram.com/v1/speak", headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"}, params={"model": "aura-2-luna-en", "encoding": "mp3"}, json={"text": "Welcome back to TokRepo. You have three new asset notifications."}, ) with open("welcome.mp3", "wb") as f: f.write(resp.content) ``` ### 流式 WebSocket ```python import asyncio from deepgram import DeepgramClient import sounddevice as sd import numpy as np dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"]) async def stream(): ws = dg.speak.websocket.v("1") await ws.start({ "model": "aura-2-luna-en", "encoding": "linear16", "sample_rate": 24000, }) ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False)) await ws.send_text("Hi there! How can I help you today?") await ws.flush() await ws.wait_for_complete() await ws.finish() asyncio.run(stream()) ``` ### 嗓音目录(Aura 2) | 嗓音 ID | 描述 | |---|---| | `aura-2-luna-en` | 温暖美式女声,默认 | | `aura-2-stella-en` | 明亮美式女声,播客活力 | | `aura-2-orion-en` | 低沉美式男声,权威感 | | `aura-2-arcas-en` | 30 出头美式男声,对话感 | | `aura-2-asteria-en` | 平静英式女声 | | `aura-2-hera-en` | 专业美式女声,客服 | | `aura-2-helios-en` | 温暖英式男声 | | `aura-2-perseus-en` | 美式男声,中性 | 西班牙语、法语、德语、葡语 2026 年陆续加入 —— 看 `developers.deepgram.com/docs/text-to-speech` 拿当前语言列表。 ### Aura vs ElevenLabs vs Cartesia | 维度 | Aura | ElevenLabs | Cartesia | |---|---|---|---| | 首音频时间 | ~250ms | ~280ms | ~75ms | | 英语自然度 | 高 | 最高 | 高 | | 长篇旁白 | 一般 | 极佳 | 好 | | 对话契合度 | 极佳 | 极佳 | 极佳 | | 语言 | EN(2026 更多)| 32 | 15 | | 每分钟成本 | $0.015 | $0.015-0.18 | $0.025/千字 | ### 价格 - Aura TTS:等效 $0.015/分钟(约 $0.030/千字符) - 免费档:注册赠 $200 credit - Voice Agent API 把 STT+LLM+TTS 打包按统一分钟费率 --- ### FAQ **Q: 为啥选 Aura 不选 ElevenLabs?** A: 跟 Deepgram STT 配是单厂商(一张账单、一份 SLA)。TTFA 比 ElevenLabs Turbo 快。嗓音库更小 —— 角色嗓音多样性或 32 种语言覆盖选 ElevenLabs。 **Q: Aura 支持 SSML 吗?** A: 有限支持 —— 停顿、强调、基础韵律。完整 SSML 比如 phoneme 标签没有。复杂韵律控制 ElevenLabs 或 Cartesia 标记更丰富。 **Q: 嗓音克隆?** A: Aura 还没有 —— 嗓音是策划过的。ElevenLabs 和 Cartesia 都支持克隆。品牌定制嗓音关键就选那俩。catalog 嗓音够用的话 Aura 质量 + 延迟在 agent 场景赢。 --- ## 来源与感谢 > Built by [Deepgram](https://github.com/deepgram). Aura TTS docs at [developers.deepgram.com/docs/text-to-speech](https://developers.deepgram.com). > > [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) --- Source: https://tokrepo.com/en/workflows/deepgram-aura-tts-text-to-speech-for-voice-agents Author: Deepgram