Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 11, 2026·4 min de lecture

LiveKit Agents — Python Framework for Voice AI

LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host.

LiveKit
LiveKit · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 55c4cb5e-5a2e-4fbc-a8a6-fb3bbf3046f5
Introduction

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Quick Use

  1. pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero
  2. Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET
  3. python agent.py dev — agent joins your test room

Intro

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Source & Thanks

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

🙏

Source et remerciements

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires