ScriptsMay 11, 2026·4 min read

LiveKit Agents — Python Framework for Voice AI

LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 29/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 55c4cb5e-5a2e-4fbc-a8a6-fb3bbf3046f5 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Quick Use

  1. pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero
  2. Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET
  3. python agent.py dev — agent joins your test room

Intro

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Source & Thanks

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

🙏

Source & Thanks

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets