Is Deepgram Voice Agent API — Unified STT+LLM+TTS free to use?

Yes. Deepgram Voice Agent API — Unified STT+LLM+TTS is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Voice Agent API — Unified STT+LLM+TTS?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMay 11, 2026·5 min de lectura

Deepgram Voice Agent API — Unified STT+LLM+TTS

Name: Deepgram Voice Agent API — Unified STT+LLM+TTS
Author: Deepgram

Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

Deepgram · Community

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: New

Entrada

Asset

Comando CLI universal

npx tokrepo install 848be675-d14d-45b0-a887-9c440d433ee7

contrato de instalación JSON de metadata plan adaptador contenido raw

Introducción

The Deepgram Voice Agent API bundles Deepgram STT (Nova-3), your LLM of choice (Anthropic, OpenAI, Groq, AWS Bedrock), and Deepgram Aura TTS into one WebSocket connection. Send mic audio in, receive agent audio out — turn detection, barge-in, function calling all handled. Best for: voice agents that don't need component-level swap, fast launch, single-vendor billing. Works with: any WebSocket-capable platform; Python, JS, Go SDKs. Setup time: 15 minutes.

Set up the WebSocket agent

import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "You are a friendly customer support agent for TokRepo. Keep replies under 2 sentences.",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # Play this on speakers (or send to WebRTC peer)
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("user speaking — barge-in"))

    # Feed mic audio
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())

Function calling

agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "Look up an order by ID",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))

LLM provider options

Provider	Notes
`openai`	gpt-4o, gpt-4o-mini
`anthropic`	claude-3-5-sonnet, haiku
`groq`	Llama 3.3 70B at 280 tok/s — lowest latency
`aws_bedrock`	Bedrock-hosted models (good for regulated AWS shops)
`custom`	Any OpenAI-compatible endpoint

Aura TTS voice cheat sheet

Voice ID	Best for
`aura-2-luna-en`	Default — warm American female
`aura-2-stella-en`	Energetic, podcast-style
`aura-2-asteria-en`	Calm British female
`aura-2-orion-en`	Authoritative American male

Voice Agent vs DIY pipeline

Need	Choose
Ship fast, single vendor	Voice Agent API
Use any TTS / STT / LLM mix	DIY (LiveKit Agents)
Need ultra-low TTS latency	DIY with Cartesia TTS
Need open-weight LLM at low cost	DIY with Groq Llama 3.3

FAQ

Q: How is this different from ElevenLabs ConvAI? A: Both are managed voice agent APIs. Deepgram leans on its in-house STT strength and lets you pick the LLM; ElevenLabs leans on its TTS strength. If STT quality matters more (call centers, noisy audio) → Deepgram. If voice naturalness matters more (consumer brand) → ElevenLabs.

Q: Turn detection — how good? A: Deepgram uses VAD + utterance-end signals (default 1000ms silence threshold). Tune endpointing for snappier (300ms) or more patient (2000ms) cutoffs. Aggressive endpointing risks chopping speech; conservative wastes time.

Q: Pricing model? A: Bundled per-minute billed at conversation duration. Roughly $0.08/min on standard configurations. Cheaper than DIY at low volume; DIY wins at high volume where you optimize per-component costs.

Quick Use

pip install deepgram-sdk
Build agent_config with listen/think/speak sections
dg.agent.websocket.v('1').start(agent_config) + send mic audio + play AudioOutput

Intro

Set up the WebSocket agent

import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "You are a friendly customer support agent for TokRepo. Keep replies under 2 sentences.",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # Play this on speakers (or send to WebRTC peer)
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("user speaking — barge-in"))

    # Feed mic audio
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())

Function calling

agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "Look up an order by ID",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))

LLM provider options

Provider	Notes
`openai`	gpt-4o, gpt-4o-mini
`anthropic`	claude-3-5-sonnet, haiku
`groq`	Llama 3.3 70B at 280 tok/s — lowest latency
`aws_bedrock`	Bedrock-hosted models (good for regulated AWS shops)
`custom`	Any OpenAI-compatible endpoint

Aura TTS voice cheat sheet

Voice ID	Best for
`aura-2-luna-en`	Default — warm American female
`aura-2-stella-en`	Energetic, podcast-style
`aura-2-asteria-en`	Calm British female
`aura-2-orion-en`	Authoritative American male

Voice Agent vs DIY pipeline

Need	Choose
Ship fast, single vendor	Voice Agent API
Use any TTS / STT / LLM mix	DIY (LiveKit Agents)
Need ultra-low TTS latency	DIY with Cartesia TTS
Need open-weight LLM at low cost	DIY with Groq Llama 3.3

FAQ

Source & Thanks

Built by Deepgram. Voice Agent docs at developers.deepgram.com/docs/voice-agent.

deepgram/deepgram-python-sdk

🙏

Fuente y agradecimientos

Built by Deepgram. Voice Agent docs at developers.deepgram.com/docs/voice-agent.

deepgram/deepgram-python-sdk

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

ElevenLabs ConvAI — Full-Duplex Voice Agent Platform

ElevenLabs ConvAI bundles STT, LLM, TTS, VAD, barge-in into one managed voice agent. Define prompt, attach tools, point at Twilio number.

Skills

ElevenLabs

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

Workflows

Vapi

Cartesia Streaming WebSocket — Full-Duplex Voice Agent TTS

Cartesia's streaming WebSocket pipelines LLM text chunks in and audio out simultaneously. Required for sub-second voice agent round-trips.

Scripts

Cartesia

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Scripts

Deepgram