# LiveKit Agents — Python Framework for Voice AI > LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host. ## Install Save as a script file and run: ## Quick Use 1. `pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero` 2. Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET 3. `python agent.py dev` — agent joins your test room --- ## Intro LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes. --- ### Install ```bash pip install livekit-agents \ livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero ``` ### Minimum viable voice agent ```python import asyncio from livekit import agents from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli from livekit.plugins import openai, deepgram, cartesia, silero async def entrypoint(ctx: JobContext): await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) assistant = agents.VoicePipelineAgent( vad=silero.VAD.load(), stt=deepgram.STT(model="nova-3", language="multi"), llm=openai.LLM(model="gpt-4o-mini"), tts=cartesia.TTS(voice="alloy"), chat_ctx=agents.llm.ChatContext().append( role="system", text="You are a helpful voice assistant. Keep replies short — under 2 sentences.", ), ) assistant.start(ctx.room) await assistant.say("Hi! What can I help you with?", allow_interruptions=True) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ### Function calling (tool use mid-conversation) ```python from livekit.agents.llm import function_tool @function_tool async def get_weather(location: str) -> str: '''Get the current weather for a location.''' return await my_weather_api(location) assistant = agents.VoicePipelineAgent( ..., fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]), ) ``` ### Latency budget | Stage | Typical | Tight | |---|---|---| | VAD end-of-speech | 200–500ms | 200ms | | STT (Deepgram Nova-3) | 60–250ms | 100ms | | LLM (gpt-4o-mini streaming) | 300–800ms | 400ms | | TTS first audio (Cartesia) | 75–200ms | 100ms | | Network + WebRTC | 50–150ms | 80ms | | **Total round-trip** | | **~880ms** | ### Run locally with CLI ```bash python agent.py dev # connects to LiveKit Cloud dev URL, watches for code changes python agent.py start # production worker mode ``` --- ### FAQ **Q: LiveKit Agents vs Vapi vs Retell?** A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale. **Q: Can I use it without WebRTC?** A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi. **Q: How are interruptions handled?** A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via `silero.VAD.load(min_speech_duration=...)`. --- ## Source & Thanks > Built by [LiveKit](https://github.com/livekit). Licensed under Apache-2.0. > > [livekit/agents](https://github.com/livekit/agents) — ⭐ 4,500+ --- ## 快速使用 1. `pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero` 2. 在 cloud.livekit.io 注册,设 LIVEKIT_URL + API_KEY + API_SECRET 3. `python agent.py dev` —— agent 加入你的测试 room --- ## 简介 LiveKit Agents 是专为实时语音 AI 打造的 Python 框架 —— STT、LLM、TTS 串起来,VAD、回合结束检测、打断处理开箱即用。在 LiveKit Cloud 上跑或自托管 LiveKit Server(WebRTC)。适合电话语音 agent、浏览器语音聊天、应用内语音 copilot —— 任何往返延迟 <1.5 秒重要的场景。兼容 Python 3.10+、任何 STT(Deepgram / AssemblyAI / Groq Whisper)、任何 LLM(OpenAI / Anthropic / Llama)、任何 TTS(Cartesia / ElevenLabs / Deepgram)。装机时间 10 分钟。 --- ### 安装 ```bash pip install livekit-agents \ livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero ``` ### 最小可用语音 agent ```python import asyncio from livekit import agents from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli from livekit.plugins import openai, deepgram, cartesia, silero async def entrypoint(ctx: JobContext): await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) assistant = agents.VoicePipelineAgent( vad=silero.VAD.load(), stt=deepgram.STT(model="nova-3", language="multi"), llm=openai.LLM(model="gpt-4o-mini"), tts=cartesia.TTS(voice="alloy"), chat_ctx=agents.llm.ChatContext().append( role="system", text="你是一个有帮助的语音助手。回复短一些 —— 不超过 2 句。", ), ) assistant.start(ctx.room) await assistant.say("你好!有什么可以帮你的?", allow_interruptions=True) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ### Function calling(对话中途用工具) ```python from livekit.agents.llm import function_tool @function_tool async def get_weather(location: str) -> str: '''拿当前天气。''' return await my_weather_api(location) assistant = agents.VoicePipelineAgent( ..., fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]), ) ``` ### 延迟预算 | 阶段 | 典型 | 紧 | |---|---|---| | VAD 检测句末 | 200–500ms | 200ms | | STT(Deepgram Nova-3)| 60–250ms | 100ms | | LLM(gpt-4o-mini 流式)| 300–800ms | 400ms | | TTS 首音频(Cartesia)| 75–200ms | 100ms | | 网络 + WebRTC | 50–150ms | 80ms | | **总往返** | | **~880ms** | ### 本地用 CLI 跑 ```bash python agent.py dev # 连 LiveKit Cloud dev URL,监听代码变更 python agent.py start # 生产 worker 模式 ``` --- ### FAQ **Q: LiveKit Agents vs Vapi vs Retell?** A: Vapi 和 Retell 是托管 turnkey 语音 agent 平台 —— 上线快、栈固定、灵活性低。LiveKit Agents 是自带组件 —— 自己挑 STT/LLM/TTS、部到自己基建、每阶段优化。要控制权或规模化成本优化就选 LiveKit。 **Q: 不用 WebRTC 行吗?** A: 电话场景可以 —— LiveKit 有 SIP trunk。仅 HTTP 环境不行 —— 框架建在 LiveKit room 模型上。备选:直接在 Twilio Media Streams 上自建流水线,或用 Vapi 这种托管方案。 **Q: 打断怎么处理?** A: VAD 检测用户开口;框架取消正在播的 TTS、把 assistant 最后未完成的发言从聊天历史里截断、把新用户音频路由到 STT。通过 `silero.VAD.load(min_speech_duration=...)` 调激进度。 --- ## 来源与感谢 > Built by [LiveKit](https://github.com/livekit). Licensed under Apache-2.0. > > [livekit/agents](https://github.com/livekit/agents) — ⭐ 4,500+ --- Source: https://tokrepo.com/en/workflows/livekit-agents-python-framework-for-voice-ai Author: LiveKit