Is Deepgram Aura TTS — Text-to-Speech for Voice Agents free to use?

Yes. Deepgram Aura TTS — Text-to-Speech for Voice Agents is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Aura TTS — Text-to-Speech for Voice Agents?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsMay 11, 2026·5 min read

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Name: Deepgram Aura TTS — Text-to-Speech for Voice Agents
Author: Deepgram

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Deepgram · Community

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Stage only

Trust

Trust: New

Entrypoint

Asset

Universal CLI install command

npx tokrepo install 12787f56-eff3-402d-a1f7-1eb2ce567400

install contract metadata JSON adapter plan raw content

Intro

Aura is Deepgram's TTS — purpose-built for conversational voice agents rather than long-form narration. 250ms time-to-first-audio, 12 English voices tuned for natural turn-taking, streaming WebSocket and REST APIs. Pairs natively with Deepgram STT for a low-friction single-vendor voice stack. Best for: customer support voice agents, IVR replacement, voice copilots where 'sounds like a real person on a phone' beats 'audiobook quality'. Works with: Deepgram SDKs, REST, WebSocket, Voice Agent API. Setup time: 5 minutes.

Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID	Description
`aura-2-luna-en`	Warm American female, default
`aura-2-stella-en`	Bright American female, podcast energy
`aura-2-orion-en`	Deep American male, authoritative
`aura-2-arcas-en`	Mid-30s American male, conversational
`aura-2-asteria-en`	Calm British female
`aura-2-hera-en`	Professional American female, customer-service
`aura-2-helios-en`	Warm British male
`aura-2-perseus-en`	American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality	Aura	ElevenLabs	Cartesia
Time to first audio	~250ms	~280ms	~75ms
English naturalness	High	Highest	High
Long-form narration	Fair	Excellent	Good
Conversational fit	Excellent	Excellent	Excellent
Languages	EN (more in 2026)	32	15
Per-minute cost	$0.015	$0.015-0.18	$0.025/1k chars

Pricing

Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
Free tier: $200 credit at signup
Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Q: Why use Aura over ElevenLabs? A: Single vendor (one bill, one SLA) when paired with Deepgram STT. Faster TTFA than ElevenLabs Turbo. Voice library is smaller — pick ElevenLabs for character voice diversity or 32-language coverage.

Q: Does Aura support SSML? A: Limited SSML support — pauses, emphasis, basic prosody. Full SSML like phoneme tags isn't there. For complex prosodic control, ElevenLabs or Cartesia have richer markup.

Q: Voice cloning? A: Not yet on Aura — voices are curated. ElevenLabs and Cartesia both offer cloning. If branded custom voice is critical, those are the platforms. If catalog voices suffice, Aura's quality + latency wins for agents.

Quick Use

POST /v1/speak?model=aura-2-luna-en with JSON {text} for batch
WebSocket dg.speak.websocket.v('1') for streaming voice agents
Pair with Deepgram STT or use Voice Agent API for full stack

Intro

Single audio buffer (REST)

import requests

resp = requests.post(
    "https://api.deepgram.com/v1/speak",
    headers={"Authorization": f"Token {os.environ['DEEPGRAM_API_KEY']}", "Content-Type": "application/json"},
    params={"model": "aura-2-luna-en", "encoding": "mp3"},
    json={"text": "Welcome back to TokRepo. You have three new asset notifications."},
)
with open("welcome.mp3", "wb") as f:
    f.write(resp.content)

Streaming WebSocket

import asyncio
from deepgram import DeepgramClient
import sounddevice as sd
import numpy as np

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def stream():
    ws = dg.speak.websocket.v("1")
    await ws.start({
        "model": "aura-2-luna-en",
        "encoding": "linear16",
        "sample_rate": 24000,
    })

    ws.on("AudioData", lambda data: sd.play(np.frombuffer(data, dtype=np.int16), 24000, blocking=False))

    await ws.send_text("Hi there! How can I help you today?")
    await ws.flush()
    await ws.wait_for_complete()
    await ws.finish()

asyncio.run(stream())

Voice catalog (Aura 2)

Voice ID	Description
`aura-2-luna-en`	Warm American female, default
`aura-2-stella-en`	Bright American female, podcast energy
`aura-2-orion-en`	Deep American male, authoritative
`aura-2-arcas-en`	Mid-30s American male, conversational
`aura-2-asteria-en`	Calm British female
`aura-2-hera-en`	Professional American female, customer-service
`aura-2-helios-en`	Warm British male
`aura-2-perseus-en`	American male, neutral

Spanish, French, German, and Portuguese voices added throughout 2026 — check developers.deepgram.com/docs/text-to-speech for current language list.

Aura vs ElevenLabs vs Cartesia

Quality	Aura	ElevenLabs	Cartesia
Time to first audio	~250ms	~280ms	~75ms
English naturalness	High	Highest	High
Long-form narration	Fair	Excellent	Good
Conversational fit	Excellent	Excellent	Excellent
Languages	EN (more in 2026)	32	15
Per-minute cost	$0.015	$0.015-0.18	$0.025/1k chars

Pricing

Aura TTS: $0.015/min equivalent (~$0.030/1k characters)
Free tier: $200 credit at signup
Voice Agent API bundles STT+LLM+TTS at one per-minute rate

FAQ

Source & Thanks

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

🙏

Source & Thanks

Built by Deepgram. Aura TTS docs at developers.deepgram.com/docs/text-to-speech.

deepgram/deepgram-python-sdk

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Groq Whisper — Sub-Second Speech-to-Text for Voice Agents

Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.

Scripts

Groq

LiveKit Agents — Build Real-Time Voice AI Agents

Framework for building real-time voice AI agents. STT, LLM, TTS pipeline with sub-second latency. Supports OpenAI, Anthropic, Deepgram, ElevenLabs. 9.9K+ stars.

Scripts

LiveKit

F5-TTS — Flow Matching Text-to-Speech

F5-TTS is a diffusion transformer TTS system with flow matching. 14.3K+ GitHub stars. Multi-speaker, voice chat, Gradio UI, CLI inference, 0.04 RTF on L20 GPU. MIT code.

Scripts

Script Depot

Deepgram Voice Agent API — Unified STT+LLM+TTS

Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

Skills

Deepgram