ScriptsMar 30, 2026·2 min read

LiveKit Agents — Build Real-Time Voice AI Agents

Framework for building real-time voice AI agents. STT, LLM, TTS pipeline with sub-second latency. Supports OpenAI, Anthropic, Deepgram, ElevenLabs. 9.9K+ stars.

TL;DR
LiveKit Agents connects STT, LLM, and TTS into a real-time voice pipeline over WebRTC for building voice AI agents.
§01

What it is

LiveKit Agents is an open-source Python framework for building real-time voice AI agents. It provides a pipeline architecture that chains Speech-to-Text, LLM processing, and Text-to-Speech with sub-second end-to-end latency, all running over LiveKit's WebRTC infrastructure.

The framework is designed for developers building voice assistants, phone agents, video call AI participants, and conversational interfaces. It supports multiple providers including OpenAI, Anthropic, Deepgram, and ElevenLabs.

§02

How it saves time or tokens

Without LiveKit Agents, building a voice AI pipeline requires stitching together separate STT, LLM, and TTS services, managing WebRTC connections, handling voice activity detection, and dealing with audio streaming protocols. LiveKit Agents abstracts all of this into a modular pipeline where you pick your providers and the framework handles the real-time orchestration.

The plugin system means swapping providers is a one-line change. Moving from OpenAI TTS to ElevenLabs requires changing only the TTS parameter in your AgentSession constructor.

§03

How to use

  1. Install the framework and your chosen plugins: pip install livekit-agents livekit-plugins-openai livekit-plugins-silero.
  2. Define an entrypoint function that creates an AgentSession with your STT, LLM, TTS, and VAD providers.
  3. Run the agent with cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) and connect clients via LiveKit rooms.
§04

Example

from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.voice import AgentSession, Agent
from livekit.plugins import openai, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    session = AgentSession(
        stt=openai.STT(),
        llm=openai.LLM(),
        tts=openai.TTS(),
        vad=silero.VAD.load(),
    )
    await session.start(ctx.room)

cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
§05

Related on TokRepo

§06

Common pitfalls

  • WebRTC requires proper TURN/STUN server configuration for production deployments behind firewalls or NAT; the local development setup may not reflect real network conditions.
  • Voice activity detection (VAD) tuning is critical -- too sensitive and the agent interrupts users mid-sentence, too conservative and response latency increases.
  • Each provider plugin has its own API key requirements; ensure all keys are set in environment variables before starting the agent.

Frequently Asked Questions

What AI providers does LiveKit Agents support?+

LiveKit Agents supports OpenAI (including the Realtime API), Anthropic, Deepgram, ElevenLabs, Azure, Google, Cartesia, AssemblyAI, and Silero for voice activity detection. The plugin architecture makes adding new providers straightforward.

What is the typical end-to-end latency?+

LiveKit Agents achieves sub-second end-to-end latency from user speech to agent response in typical configurations. The exact latency depends on your choice of STT, LLM, and TTS providers and their respective API response times.

Can I use LiveKit Agents for phone calls?+

Yes. LiveKit provides SIP integration that connects phone calls to LiveKit rooms. Your voice agent handles the audio the same way whether the caller is on a phone line or a WebRTC browser client.

Do I need to self-host LiveKit server?+

You can either self-host the open-source LiveKit server or use LiveKit Cloud as a managed service. The Agents framework works with both deployment options.

How does voice activity detection work?+

LiveKit Agents uses Silero VAD by default to detect when a user starts and stops speaking. This controls when audio is sent to the STT provider and prevents the agent from processing background noise or partial utterances.

Citations (3)
🙏

Source & Thanks

Created by LiveKit. Licensed under Apache 2.0. livekit/agents — 9,900+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets