Quick Use
pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero- Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET
python agent.py dev— agent joins your test room
Intro
LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.
Install
pip install livekit-agents \
livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-sileroMinimum viable voice agent
import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero
async def entrypoint(ctx: JobContext):
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
assistant = agents.VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(model="nova-3", language="multi"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=cartesia.TTS(voice="alloy"),
chat_ctx=agents.llm.ChatContext().append(
role="system",
text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
),
)
assistant.start(ctx.room)
await assistant.say("Hi! What can I help you with?", allow_interruptions=True)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))Function calling (tool use mid-conversation)
from livekit.agents.llm import function_tool
@function_tool
async def get_weather(location: str) -> str:
'''Get the current weather for a location.'''
return await my_weather_api(location)
assistant = agents.VoicePipelineAgent(
...,
fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)Latency budget
| Stage | Typical | Tight |
|---|---|---|
| VAD end-of-speech | 200–500ms | 200ms |
| STT (Deepgram Nova-3) | 60–250ms | 100ms |
| LLM (gpt-4o-mini streaming) | 300–800ms | 400ms |
| TTS first audio (Cartesia) | 75–200ms | 100ms |
| Network + WebRTC | 50–150ms | 80ms |
| Total round-trip | ~880ms |
Run locally with CLI
python agent.py dev # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker modeFAQ
Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.
Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.
Q: How are interruptions handled?
A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).
Source & Thanks
Built by LiveKit. Licensed under Apache-2.0.
livekit/agents — ⭐ 4,500+