What is Deepgram Aura TTS — Text-to-Speech for Voice Agents?

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Is Deepgram Aura TTS — Text-to-Speech for Voice Agents free to use?

Yes. Deepgram Aura TTS — Text-to-Speech for Voice Agents is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Aura TTS — Text-to-Speech for Voice Agents?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Deepgram Aura TTS — Text-to-Speech for Voice Agents

简介

Aura 是 Deepgram 的 TTS —— 专为对话语音 agent 而非长篇旁白设计。首音频 250ms、12 个为自然轮转调过的英语嗓音、流式 WebSocket 和 REST API。跟 Deepgram STT 原生配对，低摩擦单厂商语音栈。适合客服语音 agent、IVR 替代、「电话上听着像真人」比「有声书质量」更重要的语音 copilot。兼容 Deepgram SDK、REST、WebSocket、Voice Agent API。装机时间 5 分钟。

单音频 buffer（REST）

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

流式 WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

嗓音目录（Aura 2）

嗓音 ID	描述
`aura-2-luna-en`	温暖美式女声，默认
`aura-2-stella-en`	明亮美式女声，播客活力
`aura-2-orion-en`	低沉美式男声，权威感
`aura-2-arcas-en`	30 出头美式男声，对话感
`aura-2-asteria-en`	平静英式女声
`aura-2-hera-en`	专业美式女声，客服
`aura-2-helios-en`	温暖英式男声
`aura-2-perseus-en`	美式男声，中性

西班牙语、法语、德语、葡语 2026 年陆续加入 —— 看 developers.deepgram.com/docs/text-to-speech 拿当前语言列表。

Aura vs ElevenLabs vs Cartesia

维度	Aura	ElevenLabs	Cartesia
首音频时间	~250ms	~280ms	~75ms
英语自然度	高	最高	高
长篇旁白	一般	极佳	好
对话契合度	极佳	极佳	极佳
语言	EN（2026 更多）	32	15
每分钟成本	$0.015	$0.015-0.18	$0.025/千字

价格

Aura TTS：等效 $0.015/分钟（约 $0.030/千字符）
免费档：注册赠 $200 credit
Voice Agent API 把 STT+LLM+TTS 打包按统一分钟费率

FAQ

Q: 为啥选 Aura 不选 ElevenLabs？ A: 跟 Deepgram STT 配是单厂商（一张账单、一份 SLA）。TTFA 比 ElevenLabs Turbo 快。嗓音库更小 —— 角色嗓音多样性或 32 种语言覆盖选 ElevenLabs。

Q: Aura 支持 SSML 吗？ A: 有限支持 —— 停顿、强调、基础韵律。完整 SSML 比如 phoneme 标签没有。复杂韵律控制 ElevenLabs 或 Cartesia 标记更丰富。

Q: 嗓音克隆？ A: Aura 还没有 —— 嗓音是策划过的。ElevenLabs 和 Cartesia 都支持克隆。品牌定制嗓音关键就选那俩。catalog 嗓音够用的话 Aura 质量 + 延迟在 agent 场景赢。

Deepgram Aura TTS — Text-to-Speech for Voice Agents

这个资产会安全暂存

简介

单音频 buffer（REST）

流式 WebSocket

嗓音目录（Aura 2）

Aura vs ElevenLabs vs Cartesia

价格

FAQ

来源与感谢

讨论

相关资产

Parler-TTS — High-Quality Text-to-Speech Training and Inference Library

Deepgram Voice Agent API — Unified STT+LLM+TTS

GPT-SoVITS — Few-Shot Voice Cloning and Text-to-Speech

Tortoise TTS — Multi-Voice Text-to-Speech Focused on Quality