Is Deepgram Voice Agent API — Unified STT+LLM+TTS free to use?

Yes. Deepgram Voice Agent API — Unified STT+LLM+TTS is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Voice Agent API — Unified STT+LLM+TTS?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

SkillsMay 11, 2026·5 min de lecture

Deepgram Voice Agent API — Unified STT+LLM+TTS

Name: Deepgram Voice Agent API — Unified STT+LLM+TTS
Author: Deepgram

Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

Deepgram · Community

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Stage only

Confiance

Confiance : New

Point d'entrée

Asset

Commande CLI universelle

npx tokrepo install 848be675-d14d-45b0-a887-9c440d433ee7

contrat d'installation JSON metadata plan adaptateur contenu raw

Introduction

The Deepgram Voice Agent API bundles Deepgram STT (Nova-3), your LLM of choice (Anthropic, OpenAI, Groq, AWS Bedrock), and Deepgram Aura TTS into one WebSocket connection. Send mic audio in, receive agent audio out — turn detection, barge-in, function calling all handled. Best for: voice agents that don't need component-level swap, fast launch, single-vendor billing. Works with: any WebSocket-capable platform; Python, JS, Go SDKs. Setup time: 15 minutes.

Set up the WebSocket agent

import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "You are a friendly customer support agent for TokRepo. Keep replies under 2 sentences.",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # Play this on speakers (or send to WebRTC peer)
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("user speaking — barge-in"))

    # Feed mic audio
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())

Function calling

agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "Look up an order by ID",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))

LLM provider options

Provider	Notes
`openai`	gpt-4o, gpt-4o-mini
`anthropic`	claude-3-5-sonnet, haiku
`groq`	Llama 3.3 70B at 280 tok/s — lowest latency
`aws_bedrock`	Bedrock-hosted models (good for regulated AWS shops)
`custom`	Any OpenAI-compatible endpoint

Aura TTS voice cheat sheet

Voice ID	Best for
`aura-2-luna-en`	Default — warm American female
`aura-2-stella-en`	Energetic, podcast-style
`aura-2-asteria-en`	Calm British female
`aura-2-orion-en`	Authoritative American male

Voice Agent vs DIY pipeline

Need	Choose
Ship fast, single vendor	Voice Agent API
Use any TTS / STT / LLM mix	DIY (LiveKit Agents)
Need ultra-low TTS latency	DIY with Cartesia TTS
Need open-weight LLM at low cost	DIY with Groq Llama 3.3

FAQ

Q: How is this different from ElevenLabs ConvAI? A: Both are managed voice agent APIs. Deepgram leans on its in-house STT strength and lets you pick the LLM; ElevenLabs leans on its TTS strength. If STT quality matters more (call centers, noisy audio) → Deepgram. If voice naturalness matters more (consumer brand) → ElevenLabs.

Q: Turn detection — how good? A: Deepgram uses VAD + utterance-end signals (default 1000ms silence threshold). Tune endpointing for snappier (300ms) or more patient (2000ms) cutoffs. Aggressive endpointing risks chopping speech; conservative wastes time.

Q: Pricing model? A: Bundled per-minute billed at conversation duration. Roughly $0.08/min on standard configurations. Cheaper than DIY at low volume; DIY wins at high volume where you optimize per-component costs.

Quick Use

pip install deepgram-sdk
Build agent_config with listen/think/speak sections
dg.agent.websocket.v('1').start(agent_config) + send mic audio + play AudioOutput

Intro

Set up the WebSocket agent

import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "You are a friendly customer support agent for TokRepo. Keep replies under 2 sentences.",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # Play this on speakers (or send to WebRTC peer)
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("user speaking — barge-in"))

    # Feed mic audio
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())

Function calling

agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "Look up an order by ID",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))

LLM provider options

Provider	Notes
`openai`	gpt-4o, gpt-4o-mini
`anthropic`	claude-3-5-sonnet, haiku
`groq`	Llama 3.3 70B at 280 tok/s — lowest latency
`aws_bedrock`	Bedrock-hosted models (good for regulated AWS shops)
`custom`	Any OpenAI-compatible endpoint

Aura TTS voice cheat sheet

Voice ID	Best for
`aura-2-luna-en`	Default — warm American female
`aura-2-stella-en`	Energetic, podcast-style
`aura-2-asteria-en`	Calm British female
`aura-2-orion-en`	Authoritative American male

Voice Agent vs DIY pipeline

Need	Choose
Ship fast, single vendor	Voice Agent API
Use any TTS / STT / LLM mix	DIY (LiveKit Agents)
Need ultra-low TTS latency	DIY with Cartesia TTS
Need open-weight LLM at low cost	DIY with Groq Llama 3.3

FAQ

Source & Thanks

Built by Deepgram. Voice Agent docs at developers.deepgram.com/docs/voice-agent.

deepgram/deepgram-python-sdk

🙏

Source et remerciements

Built by Deepgram. Voice Agent docs at developers.deepgram.com/docs/voice-agent.

deepgram/deepgram-python-sdk

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

ElevenLabs ConvAI — Full-Duplex Voice Agent Platform

ElevenLabs ConvAI bundles STT, LLM, TTS, VAD, barge-in into one managed voice agent. Define prompt, attach tools, point at Twilio number.

Skills

ElevenLabs

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

Workflows

Vapi

Cartesia Streaming WebSocket — Full-Duplex Voice Agent TTS

Cartesia's streaming WebSocket pipelines LLM text chunks in and audio out simultaneously. Required for sub-second voice agent round-trips.

Scripts

Cartesia

Deepgram Aura TTS — Text-to-Speech for Voice Agents

Deepgram Aura TTS produces natural English TTS with 250ms TTFA. Streaming WebSocket, 12 voices, tuned for conversational agents not narration.

Scripts

Deepgram