# LiveKit Agents — Python Framework for Voice AI

> LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host.

## Install

Save as a script file and run:

## Quick Use

1. `pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero`
2. Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET
3. `python agent.py dev` — agent joins your test room

---

## Intro

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.

---

### Install

```bash
pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero
```

### Minimum viable voice agent

```python
import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```

### Function calling (tool use mid-conversation)

```python
from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)
```

### Latency budget

| Stage | Typical | Tight |
|---|---|---|
| VAD end-of-speech | 200–500ms | 200ms |
| STT (Deepgram Nova-3) | 60–250ms | 100ms |
| LLM (gpt-4o-mini streaming) | 300–800ms | 400ms |
| TTS first audio (Cartesia) | 75–200ms | 100ms |
| Network + WebRTC | 50–150ms | 80ms |
| **Total round-trip** | | **~880ms** |

### Run locally with CLI

```bash
python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode
```

---

### FAQ

**Q: LiveKit Agents vs Vapi vs Retell?**
A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

**Q: Can I use it without WebRTC?**
A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

**Q: How are interruptions handled?**
A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via `silero.VAD.load(min_speech_duration=...)`.

---

## Source & Thanks

> Built by [LiveKit](https://github.com/livekit). Licensed under Apache-2.0.
>
> [livekit/agents](https://github.com/livekit/agents) — ⭐ 4,500+

---

<!-- ZH -->

## 快速使用

1. `pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero`
2. 在 cloud.livekit.io 注册，设 LIVEKIT_URL + API_KEY + API_SECRET
3. `python agent.py dev` —— agent 加入你的测试 room

---

## 简介

LiveKit Agents 是专为实时语音 AI 打造的 Python 框架 —— STT、LLM、TTS 串起来，VAD、回合结束检测、打断处理开箱即用。在 LiveKit Cloud 上跑或自托管 LiveKit Server（WebRTC）。适合电话语音 agent、浏览器语音聊天、应用内语音 copilot —— 任何往返延迟 <1.5 秒重要的场景。兼容 Python 3.10+、任何 STT（Deepgram / AssemblyAI / Groq Whisper）、任何 LLM（OpenAI / Anthropic / Llama）、任何 TTS（Cartesia / ElevenLabs / Deepgram）。装机时间 10 分钟。

---

### 安装

```bash
pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero
```

### 最小可用语音 agent

```python
import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="你是一个有帮助的语音助手。回复短一些 —— 不超过 2 句。",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("你好！有什么可以帮你的？", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
```

### Function calling（对话中途用工具）

```python
from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''拿当前天气。'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)
```

### 延迟预算

| 阶段 | 典型 | 紧 |
|---|---|---|
| VAD 检测句末 | 200–500ms | 200ms |
| STT（Deepgram Nova-3）| 60–250ms | 100ms |
| LLM（gpt-4o-mini 流式）| 300–800ms | 400ms |
| TTS 首音频（Cartesia）| 75–200ms | 100ms |
| 网络 + WebRTC | 50–150ms | 80ms |
| **总往返** | | **~880ms** |

### 本地用 CLI 跑

```bash
python agent.py dev   # 连 LiveKit Cloud dev URL，监听代码变更
python agent.py start # 生产 worker 模式
```

---

### FAQ

**Q: LiveKit Agents vs Vapi vs Retell？**
A: Vapi 和 Retell 是托管 turnkey 语音 agent 平台 —— 上线快、栈固定、灵活性低。LiveKit Agents 是自带组件 —— 自己挑 STT/LLM/TTS、部到自己基建、每阶段优化。要控制权或规模化成本优化就选 LiveKit。

**Q: 不用 WebRTC 行吗？**
A: 电话场景可以 —— LiveKit 有 SIP trunk。仅 HTTP 环境不行 —— 框架建在 LiveKit room 模型上。备选：直接在 Twilio Media Streams 上自建流水线，或用 Vapi 这种托管方案。

**Q: 打断怎么处理？**
A: VAD 检测用户开口；框架取消正在播的 TTS、把 assistant 最后未完成的发言从聊天历史里截断、把新用户音频路由到 STT。通过 `silero.VAD.load(min_speech_duration=...)` 调激进度。

---

## 来源与感谢

> Built by [LiveKit](https://github.com/livekit). Licensed under Apache-2.0.
>
> [livekit/agents](https://github.com/livekit/agents) — ⭐ 4,500+


---
Source: https://tokrepo.com/en/workflows/livekit-agents-python-framework-for-voice-ai
Author: LiveKit