What is Deepgram Voice Agent API — Unified STT+LLM+TTS?

Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

Is Deepgram Voice Agent API — Unified STT+LLM+TTS free to use?

Yes. Deepgram Voice Agent API — Unified STT+LLM+TTS is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Deepgram Voice Agent API — Unified STT+LLM+TTS?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Deepgram Voice Agent API — Unified STT+LLM+TTS

Name: Deepgram Voice Agent API — Unified STT+LLM+TTS
Author: Deepgram

简介

Deepgram Voice Agent API 把 Deepgram STT（Nova-3）、你选的 LLM（Anthropic / OpenAI / Groq / AWS Bedrock）、Deepgram Aura TTS 打包到一个 WebSocket 连接。麦克风音频进、agent 音频出 —— 回合检测、打断、function calling 都处理好。适合不需要组件级切换的语音 agent、快速上线、单厂商账单。任何能跑 WebSocket 的平台都行；Python / JS / Go SDK。装机时间 15 分钟。

配 WebSocket agent

import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "你是 TokRepo 的友好客服 agent。回复不超 2 句。",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # 在扬声器播（或发到 WebRTC peer）
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("用户开口 —— 打断"))

    # 喂麦克风音频
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())

Function calling

agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "按订单号查",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))

LLM 提供商选项

Provider	备注
`openai`	gpt-4o、gpt-4o-mini
`anthropic`	claude-3-5-sonnet、haiku
`groq`	Llama 3.3 70B 280 tok/秒 —— 最低延迟
`aws_bedrock`	Bedrock 托管模型（合规 AWS 店适合）
`custom`	任何 OpenAI 兼容 endpoint

Aura TTS 嗓音 cheat sheet

嗓音 ID	最佳用途
`aura-2-luna-en`	默认 —— 温暖美式女声
`aura-2-stella-en`	活力、播客风
`aura-2-asteria-en`	平静英式女声
`aura-2-orion-en`	权威美式男声

Voice Agent vs DIY 流水线

需求	选择
上线快、单厂商	Voice Agent API
任意 TTS / STT / LLM 组合	DIY（LiveKit Agents）
要超低 TTS 延迟	DIY 配 Cartesia TTS
要低成本开源权重 LLM	DIY 配 Groq Llama 3.3

FAQ

Q: 跟 ElevenLabs ConvAI 啥区别？ A: 都是托管语音 agent API。Deepgram 靠自家 STT 实力 + 让你选 LLM；ElevenLabs 靠自家 TTS 实力。STT 质量更重要（呼叫中心、嘈杂）→ Deepgram。嗓音自然度更重要（消费品牌）→ ElevenLabs。

Q: 回合检测多准？ A: Deepgram 用 VAD + utterance-end 信号（默认 1000ms 静音阈值）。调 endpointing 让截断更快（300ms）或更耐心（2000ms）。激进 endpointing 风险切话；保守浪费时间。

Q: 价格模型？ A: 按对话时长计费。标准配置约 $0.08/分钟。低量比 DIY 便宜；高量 DIY 赢，能按组件优化成本。

Deepgram Voice Agent API — Unified STT+LLM+TTS

这个资产可以被 Agent 直接读取和安装

简介

配 WebSocket agent

Function calling

LLM 提供商选项

Aura TTS 嗓音 cheat sheet

Voice Agent vs DIY 流水线

FAQ

来源与感谢

讨论

相关资产

ElevenLabs ConvAI — Full-Duplex Voice Agent Platform

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Cartesia Streaming WebSocket — Full-Duplex Voice Agent TTS

Deepgram Aura TTS — Text-to-Speech for Voice Agents