Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 11, 2026·5 min de lecture

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Deepgram
Deepgram · Community
Prêt pour agents

Staging sûr pour cet actif

Cet actif est d'abord staged. Le prompt copié demande à l'agent d'inspecter les fichiers staged avant d'activer scripts, config MCP ou config globale.

Stage only · 29/100Policy : staging
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Stage only
Confiance
Confiance : Community
Point d'entrée
Asset
Commande de staging sûr
npx -y tokrepo@latest install 12787f56-eff3-402d-a1f7-1eb2ce567400 --target codex

Stage les fichiers d'abord; l'activation exige la revue du README et du plan staged.

Introduction

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.


Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID Description
aura-2-luna-en Warm American female, default
aura-2-stella-en Bright American female, podcast energy
aura-2-orion-en Deep American male, authoritative
aura-2-arcas-en Mid-30s American male, conversational
aura-2-asteria-en Calm British female
aura-2-hera-en Professional American female, customer-service
aura-2-helios-en Warm British male
aura-2-perseus-en American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality Aura ElevenLabs Cartesia
Time to first audio ~250ms ~280ms ~75ms
English naturalness High Highest High
Long-form narration Fair Excellent Good
Conversational fit Excellent Excellent Excellent
Languages EN (more in 2026) 32 15
Per-minute cost $0.015 $0.015-0.18 $0.025/1k chars

Pricing

  • Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
  • Free tier: $200 credit at signup
  • Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.


Quick Use

  1. POST /v1/speak?model=aura-2-luna-en with JSON {text} for batch
  2. WebSocket dg.speak.websocket.v('1') for streaming voice agents
  3. Pair with Deepgram STT or use Voice Agent API for full stack

Intro

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.


Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID Description
aura-2-luna-en Warm American female, default
aura-2-stella-en Bright American female, podcast energy
aura-2-orion-en Deep American male, authoritative
aura-2-arcas-en Mid-30s American male, conversational
aura-2-asteria-en Calm British female
aura-2-hera-en Professional American female, customer-service
aura-2-helios-en Warm British male
aura-2-perseus-en American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality Aura ElevenLabs Cartesia
Time to first audio ~250ms ~280ms ~75ms
English naturalness High Highest High
Long-form narration Fair Excellent Good
Conversational fit Excellent Excellent Excellent
Languages EN (more in 2026) 32 15
Per-minute cost $0.015 $0.015-0.18 $0.025/1k chars

Pricing

  • Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
  • Free tier: $200 credit at signup
  • Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.


Source & Thanks

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

🙏

Source et remerciements

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires