# Deepgram Voice Agent API — Unified STT+LLM+TTS

> Deepgram Voice Agent API bundles STT + your LLM + Aura TTS into one WebSocket. Full-duplex voice. Turn detection and barge-in configurable.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. `pip install deepgram-sdk`
2. Build agent_config with listen/think/speak sections
3. `dg.agent.websocket.v('1').start(agent_config)` + send mic audio + play AudioOutput

---

## Intro

The Deepgram Voice Agent API bundles Deepgram STT (Nova-3), your LLM of choice (Anthropic, OpenAI, Groq, AWS Bedrock), and Deepgram Aura TTS into one WebSocket connection. Send mic audio in, receive agent audio out — turn detection, barge-in, function calling all handled. Best for: voice agents that don't need component-level swap, fast launch, single-vendor billing. Works with: any WebSocket-capable platform; Python, JS, Go SDKs. Setup time: 15 minutes.

---

### Set up the WebSocket agent

```python
import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "You are a friendly customer support agent for TokRepo. Keep replies under 2 sentences.",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # Play this on speakers (or send to WebRTC peer)
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("user speaking — barge-in"))

    # Feed mic audio
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())
```

### Function calling

```python
agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "Look up an order by ID",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))
```

### LLM provider options

| Provider | Notes |
|---|---|
| `openai` | gpt-4o, gpt-4o-mini |
| `anthropic` | claude-3-5-sonnet, haiku |
| `groq` | Llama 3.3 70B at 280 tok/s — lowest latency |
| `aws_bedrock` | Bedrock-hosted models (good for regulated AWS shops) |
| `custom` | Any OpenAI-compatible endpoint |

### Aura TTS voice cheat sheet

| Voice ID | Best for |
|---|---|
| `aura-2-luna-en` | Default — warm American female |
| `aura-2-stella-en` | Energetic, podcast-style |
| `aura-2-asteria-en` | Calm British female |
| `aura-2-orion-en` | Authoritative American male |

### Voice Agent vs DIY pipeline

| Need | Choose |
|---|---|
| Ship fast, single vendor | **Voice Agent API** |
| Use any TTS / STT / LLM mix | DIY (LiveKit Agents) |
| Need ultra-low TTS latency | DIY with Cartesia TTS |
| Need open-weight LLM at low cost | DIY with Groq Llama 3.3 |

---

### FAQ

**Q: How is this different from ElevenLabs ConvAI?**
A: Both are managed voice agent APIs. Deepgram leans on its in-house STT strength and lets you pick the LLM; ElevenLabs leans on its TTS strength. If STT quality matters more (call centers, noisy audio) → Deepgram. If voice naturalness matters more (consumer brand) → ElevenLabs.

**Q: Turn detection — how good?**
A: Deepgram uses VAD + utterance-end signals (default 1000ms silence threshold). Tune `endpointing` for snappier (300ms) or more patient (2000ms) cutoffs. Aggressive endpointing risks chopping speech; conservative wastes time.

**Q: Pricing model?**
A: Bundled per-minute billed at conversation duration. Roughly $0.08/min on standard configurations. Cheaper than DIY at low volume; DIY wins at high volume where you optimize per-component costs.

---

## Source & Thanks

> Built by [Deepgram](https://github.com/deepgram). Voice Agent docs at [developers.deepgram.com/docs/voice-agent](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk)

---

<!-- ZH -->

## 快速使用

1. `pip install deepgram-sdk`
2. 构造 agent_config，含 listen / think / speak 段
3. `dg.agent.websocket.v('1').start(agent_config)` + 发麦克风音频 + 播 AudioOutput

---

## 简介

Deepgram Voice Agent API 把 Deepgram STT（Nova-3）、你选的 LLM（Anthropic / OpenAI / Groq / AWS Bedrock）、Deepgram Aura TTS 打包到一个 WebSocket 连接。麦克风音频进、agent 音频出 —— 回合检测、打断、function calling 都处理好。适合不需要组件级切换的语音 agent、快速上线、单厂商账单。任何能跑 WebSocket 的平台都行；Python / JS / Go SDK。装机时间 15 分钟。

---

### 配 WebSocket agent

```python
import asyncio, json
from deepgram import DeepgramClient

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

agent_config = {
    "type": "SettingsConfiguration",
    "audio": {
        "input":  {"encoding": "linear16", "sample_rate": 16000},
        "output": {"encoding": "linear16", "sample_rate": 24000, "container": "none"},
    },
    "agent": {
        "listen": {"model": "nova-3"},
        "speak":  {"model": "aura-2-luna-en"},
        "think": {
            "provider": {"type": "anthropic"},
            "model":     "claude-3-5-sonnet-20241022",
            "instructions": "你是 TokRepo 的友好客服 agent。回复不超 2 句。",
        },
    },
}

async def run_agent():
    agent = dg.agent.websocket.v("1")
    await agent.start(agent_config)

    async def on_audio_output(data: bytes):
        # 在扬声器播（或发到 WebRTC peer）
        await play_audio(data)

    agent.on("AudioOutput", on_audio_output)
    agent.on("ConversationText", lambda role, content: print(f"{role}: {content}"))
    agent.on("UserStartedSpeaking", lambda: print("用户开口 —— 打断"))

    # 喂麦克风音频
    async for chunk in mic_audio():
        await agent.send(chunk)

asyncio.run(run_agent())
```

### Function calling

```python
agent_config["agent"]["think"]["functions"] = [{
    "name": "lookup_order",
    "description": "按订单号查",
    "parameters": {
        "type": "object",
        "properties": {"order_id": {"type": "string"}},
        "required": ["order_id"],
    },
}]

agent.on("FunctionCallRequest", lambda fn_call: handle_function(fn_call))
```

### LLM 提供商选项

| Provider | 备注 |
|---|---|
| `openai` | gpt-4o、gpt-4o-mini |
| `anthropic` | claude-3-5-sonnet、haiku |
| `groq` | Llama 3.3 70B 280 tok/秒 —— 最低延迟 |
| `aws_bedrock` | Bedrock 托管模型（合规 AWS 店适合）|
| `custom` | 任何 OpenAI 兼容 endpoint |

### Aura TTS 嗓音 cheat sheet

| 嗓音 ID | 最佳用途 |
|---|---|
| `aura-2-luna-en` | 默认 —— 温暖美式女声 |
| `aura-2-stella-en` | 活力、播客风 |
| `aura-2-asteria-en` | 平静英式女声 |
| `aura-2-orion-en` | 权威美式男声 |

### Voice Agent vs DIY 流水线

| 需求 | 选择 |
|---|---|
| 上线快、单厂商 | **Voice Agent API** |
| 任意 TTS / STT / LLM 组合 | DIY（LiveKit Agents）|
| 要超低 TTS 延迟 | DIY 配 Cartesia TTS |
| 要低成本开源权重 LLM | DIY 配 Groq Llama 3.3 |

---

### FAQ

**Q: 跟 ElevenLabs ConvAI 啥区别？**
A: 都是托管语音 agent API。Deepgram 靠自家 STT 实力 + 让你选 LLM；ElevenLabs 靠自家 TTS 实力。STT 质量更重要（呼叫中心、嘈杂）→ Deepgram。嗓音自然度更重要（消费品牌）→ ElevenLabs。

**Q: 回合检测多准？**
A: Deepgram 用 VAD + utterance-end 信号（默认 1000ms 静音阈值）。调 `endpointing` 让截断更快（300ms）或更耐心（2000ms）。激进 endpointing 风险切话；保守浪费时间。

**Q: 价格模型？**
A: 按对话时长计费。标准配置约 $0.08/分钟。低量比 DIY 便宜；高量 DIY 赢，能按组件优化成本。

---

## 来源与感谢

> Built by [Deepgram](https://github.com/deepgram). Voice Agent docs at [developers.deepgram.com/docs/voice-agent](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk)


---
Source: https://tokrepo.com/en/workflows/deepgram-voice-agent-api-unified-stt-llm-tts
Author: Deepgram