Is Deepgram Aura TTS — Text-to-Speech for Voice Agents free to use?

Yes. Deepgram Aura TTS — Text-to-Speech for Voice Agents is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Aura TTS — Text-to-Speech for Voice Agents?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

ScriptsMay 11, 2026·5 min de lectura

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Deepgram · Community

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 29/100Política: staging

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: Community

Entrada

Asset

Comando de staging seguro

npx -y tokrepo@latest install 12787f56-eff3-402d-a1f7-1eb2ce567400 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.

Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID	Description
`aura-2-luna-en`	Warm American female, default
`aura-2-stella-en`	Bright American female, podcast energy
`aura-2-orion-en`	Deep American male, authoritative
`aura-2-arcas-en`	Mid-30s American male, conversational
`aura-2-asteria-en`	Calm British female
`aura-2-hera-en`	Professional American female, customer-service
`aura-2-helios-en`	Warm British male
`aura-2-perseus-en`	American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality	Aura	ElevenLabs	Cartesia
Time to first audio	~250ms	~280ms	~75ms
English naturalness	High	Highest	High
Long-form narration	Fair	Excellent	Good
Conversational fit	Excellent	Excellent	Excellent
Languages	EN (more in 2026)	32	15
Per-minute cost	$0.015	$0.015-0.18	$0.025/1k chars

Pricing

Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
Free tier: $200 credit at signup
Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.

Quick Use

POST /v1/speak?model=aura-2-luna-en with JSON {text} for batch
WebSocket dg.speak.websocket.v('1') for streaming voice agents
Pair with Deepgram STT or use Voice Agent API for full stack

Intro

Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID	Description
`aura-2-luna-en`	Warm American female, default
`aura-2-stella-en`	Bright American female, podcast energy
`aura-2-orion-en`	Deep American male, authoritative
`aura-2-arcas-en`	Mid-30s American male, conversational
`aura-2-asteria-en`	Calm British female
`aura-2-hera-en`	Professional American female, customer-service
`aura-2-helios-en`	Warm British male
`aura-2-perseus-en`	American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality	Aura	ElevenLabs	Cartesia
Time to first audio	~250ms	~280ms	~75ms
English naturalness	High	Highest	High
Long-form narration	Fair	Excellent	Good
Conversational fit	Excellent	Excellent	Excellent
Languages	EN (more in 2026)	32	15
Per-minute cost	$0.015	$0.015-0.18	$0.025/1k chars

Pricing

Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
Free tier: $200 credit at signup
Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Source & Thanks

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

🙏

Fuente y agradecimientos

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Parler-TTS — High-Quality Text-to-Speech Training and Inference Library

Parler-TTS by Hugging Face provides inference and training capabilities for high-quality text-to-speech models with natural prosody and controllable speaker attributes described in plain text.

Scripts

Script Depot

Deepgram Voice Agent API — Unified STT+LLM+TTS

Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

Skills

Deepgram

GPT-SoVITS — Few-Shot Voice Cloning and Text-to-Speech

An open-source TTS system that can clone any voice from just one minute of audio data, combining GPT-style language modeling with VITS synthesis for natural speech generation.

Skills

AI Open Source

Tortoise TTS — Multi-Voice Text-to-Speech Focused on Quality

A multi-voice TTS system trained with an emphasis on audio quality. Uses autoregressive and diffusion models to produce natural, expressive speech from text.

Skills

AI Open Source