Quick Use
- POST
/v1/speak?model=aura-2-luna-enwith JSON{text}for batch - WebSocket
dg.speak.websocket.v('1')for streaming voice agents - Pair with Deepgram STT or use Voice Agent API for full stack
Intro
Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.
Single audio buffer (REST)
import requests
resp = requests.post(
"https://api.deepgram.com/v1/speak",
headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
params={"model": "aura-2-luna-en", "encoding": "mp3"},
json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
f.write(resp.content)Streaming WebSocket
import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np
dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
async def stream():
ws = dg.speak.websocket.v("1")
await ws.start({
"model": "aura-2-luna-en",
"encoding": "linear16",
"sample_rate": 24000,
})
ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))
await ws.send_text("Hi there! How can I help you today?")
await ws.flush()
await ws.wait_for_complete()
await ws.finish()
asyncio.run(stream())Voice catalog (Aura 2)
| Voice ID | Description |
|---|---|
aura-2-luna-en |
Warm American female, default |
aura-2-stella-en |
Bright American female, podcast energy |
aura-2-orion-en |
Deep American male, authoritative |
aura-2-arcas-en |
Mid-30s American male, conversational |
aura-2-asteria-en |
Calm British female |
aura-2-hera-en |
Professional American female, customer-service |
aura-2-helios-en |
Warm British male |
aura-2-perseus-en |
American male, neutral |
Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.
Aura vs ElevenLabs vs Cartesia
| Quality | Aura | ElevenLabs | Cartesia |
|---|---|---|---|
| Time to first audio | ~250ms | ~280ms | ~75ms |
| English naturalness | High | Highest | High |
| Long-form narration | Fair | Excellent | Good |
| Conversational fit | Excellent | Excellent | Excellent |
| Languages | EN (more in 2026) | 32 | 15 |
| Per-minute cost | $0.015 | $0.015-0.18 | $0.025/1k chars |
Pricing
- Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
- Free tier: $200 credit at signup
- Voice Agent API bundles STT+LLM+TTS at one per-minute rate
FAQ
Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.
Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.
Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.
Source & Thanks
Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.