Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 11, 2026·5 min de lectura

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 12787f56-eff3-402d-a1f7-1eb2ce567400
Introducción

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.


Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID Description
aura-2-luna-en Warm American female, default
aura-2-stella-en Bright American female, podcast energy
aura-2-orion-en Deep American male, authoritative
aura-2-arcas-en Mid-30s American male, conversational
aura-2-asteria-en Calm British female
aura-2-hera-en Professional American female, customer-service
aura-2-helios-en Warm British male
aura-2-perseus-en American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality Aura ElevenLabs Cartesia
Time to first audio ~250ms ~280ms ~75ms
English naturalness High Highest High
Long-form narration Fair Excellent Good
Conversational fit Excellent Excellent Excellent
Languages EN (more in 2026) 32 15
Per-minute cost $0.015 $0.015-0.18 $0.025/1k chars

Pricing

  • Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
  • Free tier: $200 credit at signup
  • Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.


Quick Use

  1. POST /v1/speak?model=aura-2-luna-en with JSON {text} for batch
  2. WebSocket dg.speak.websocket.v('1') for streaming voice agents
  3. Pair with Deepgram STT or use Voice Agent API for full stack

Intro

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.


Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID Description
aura-2-luna-en Warm American female, default
aura-2-stella-en Bright American female, podcast energy
aura-2-orion-en Deep American male, authoritative
aura-2-arcas-en Mid-30s American male, conversational
aura-2-asteria-en Calm British female
aura-2-hera-en Professional American female, customer-service
aura-2-helios-en Warm British male
aura-2-perseus-en American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality Aura ElevenLabs Cartesia
Time to first audio ~250ms ~280ms ~75ms
English naturalness High Highest High
Long-form narration Fair Excellent Good
Conversational fit Excellent Excellent Excellent
Languages EN (more in 2026) 32 15
Per-minute cost $0.015 $0.015-0.18 $0.025/1k chars

Pricing

  • Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
  • Free tier: $200 credit at signup
  • Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.


Source & Thanks

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

🙏

Fuente y agradecimientos

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados