Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 11, 2026·4 min de lectura

LiveKit Agents — Python Framework for Voice AI

LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 55c4cb5e-5a2e-4fbc-a8a6-fb3bbf3046f5
Introducción

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Quick Use

  1. pip install livekit-agents livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero
  2. Sign up at cloud.livekit.io, set LIVEKIT_URL + API_KEY + API_SECRET
  3. python agent.py dev — agent joins your test room

Intro

LiveKit Agents is a Python framework purpose-built for real-time voice AI — STT, LLM, TTS plugged together with VAD, end-of-turn detection, and barge-in handling solved out of the box. Runs on LiveKit Cloud or self-hosted LiveKit Server (WebRTC). Best for: voice agents on phone calls, browser voice chat, in-app voice copilots, anywhere a sub-1.5-second round trip matters. Works with: Python 3.10+, any STT (Deepgram, AssemblyAI, Groq Whisper), any LLM (OpenAI, Anthropic, Llama), any TTS (Cartesia, ElevenLabs, Deepgram). Setup time: 10 minutes.


Install

pip install livekit-agents \
  livekit-plugins-openai livekit-plugins-deepgram livekit-plugins-cartesia livekit-plugins-silero

Minimum viable voice agent

import asyncio
from livekit import agents
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.plugins import openai, deepgram, cartesia, silero

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = agents.VoicePipelineAgent(
        vad=silero.VAD.load(),
        stt=deepgram.STT(model="nova-3", language="multi"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(voice="alloy"),
        chat_ctx=agents.llm.ChatContext().append(
            role="system",
            text="You are a helpful voice assistant. Keep replies short — under 2 sentences.",
        ),
    )
    assistant.start(ctx.room)
    await assistant.say("Hi! What can I help you with?", allow_interruptions=True)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling (tool use mid-conversation)

from livekit.agents.llm import function_tool

@function_tool
async def get_weather(location: str) -> str:
    '''Get the current weather for a location.'''
    return await my_weather_api(location)

assistant = agents.VoicePipelineAgent(
    ...,
    fnc_ctx=agents.llm.FunctionContext(tools=[get_weather]),
)

Latency budget

Stage Typical Tight
VAD end-of-speech 200–500ms 200ms
STT (Deepgram Nova-3) 60–250ms 100ms
LLM (gpt-4o-mini streaming) 300–800ms 400ms
TTS first audio (Cartesia) 75–200ms 100ms
Network + WebRTC 50–150ms 80ms
Total round-trip ~880ms

Run locally with CLI

python agent.py dev   # connects to LiveKit Cloud dev URL, watches for code changes
python agent.py start # production worker mode

FAQ

Q: LiveKit Agents vs Vapi vs Retell? A: Vapi and Retell are managed turnkey voice agent platforms — fast to ship, opinionated stack, less flexibility. LiveKit Agents is BYO components — pick your STT/LLM/TTS, deploy to your infra, optimize each stage. Choose LiveKit when you need control or cost optimization at scale.

Q: Can I use it without WebRTC? A: For phone calls yes — LiveKit has SIP trunking. For HTTP-only environments, no — the framework is built on the LiveKit room model. Alternatives: build directly on Twilio Media Streams + your own pipeline, or use a managed alternative like Vapi.

Q: How are interruptions handled? A: The VAD detects the user starting to speak; the framework cancels the in-flight TTS playback, truncates the assistant's last incomplete utterance from chat history, and routes the new user audio to STT. Configure aggressiveness via silero.VAD.load(min_speech_duration=...).


Source & Thanks

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

🙏

Fuente y agradecimientos

Built by LiveKit. Licensed under Apache-2.0.

livekit/agents — ⭐ 4,500+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados