LiveKit Agents — Build Real-Time Voice AI Agents
Framework for building real-time voice AI agents. STT, LLM, TTS pipeline with sub-second latency. Supports OpenAI, Anthropic, Deepgram, ElevenLabs. 9.9K+ stars.
Agent 可直接安装
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install 804ee888-b285-4369-891e-15f424f587ed --target codex先 dry-run 确认安装计划,再运行此命令。
What it is
LiveKit Agents is an open-source Python framework for building real-time voice AI agents. It provides a pipeline architecture that chains Speech-to-Text, LLM processing, and Text-to-Speech with sub-second end-to-end latency, all running over LiveKit's WebRTC infrastructure.
The framework is designed for developers building voice assistants, phone agents, video call AI participants, and conversational interfaces. It supports multiple providers including OpenAI, Anthropic, Deepgram, and ElevenLabs.
How it saves time or tokens
Without LiveKit Agents, building a voice AI pipeline requires stitching together separate STT, LLM, and TTS services, managing WebRTC connections, handling voice activity detection, and dealing with audio streaming protocols. LiveKit Agents abstracts all of this into a modular pipeline where you pick your providers and the framework handles the real-time orchestration.
The plugin system means swapping providers is a one-line change. Moving from OpenAI TTS to ElevenLabs requires changing only the TTS parameter in your AgentSession constructor.
How to use
- Install the framework and your chosen plugins:
pip install livekit-agents livekit-plugins-openai livekit-plugins-silero. - Define an entrypoint function that creates an AgentSession with your STT, LLM, TTS, and VAD providers.
- Run the agent with
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))and connect clients via LiveKit rooms.
Example
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.voice import AgentSession, Agent
from livekit.plugins import openai, silero
async def entrypoint(ctx: JobContext):
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
session = AgentSession(
stt=openai.STT(),
llm=openai.LLM(),
tts=openai.TTS(),
vad=silero.VAD.load(),
)
await session.start(ctx.room)
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Related on TokRepo
- AI agent tools -- frameworks and platforms for building AI agents
- Voice tools -- speech and audio AI tools
Common pitfalls
- WebRTC requires proper TURN/STUN server configuration for production deployments behind firewalls or NAT; the local development setup may not reflect real network conditions.
- Voice activity detection (VAD) tuning is critical -- too sensitive and the agent interrupts users mid-sentence, too conservative and response latency increases.
- Each provider plugin has its own API key requirements; ensure all keys are set in environment variables before starting the agent.
常见问题
LiveKit Agents supports OpenAI (including the Realtime API), Anthropic, Deepgram, ElevenLabs, Azure, Google, Cartesia, AssemblyAI, and Silero for voice activity detection. The plugin architecture makes adding new providers straightforward.
LiveKit Agents achieves sub-second end-to-end latency from user speech to agent response in typical configurations. The exact latency depends on your choice of STT, LLM, and TTS providers and their respective API response times.
Yes. LiveKit provides SIP integration that connects phone calls to LiveKit rooms. Your voice agent handles the audio the same way whether the caller is on a phone line or a WebRTC browser client.
You can either self-host the open-source LiveKit server or use LiveKit Cloud as a managed service. The Agents framework works with both deployment options.
LiveKit Agents uses Silero VAD by default to detect when a user starts and stops speaking. This controls when audio is sent to the STT provider and prevents the agent from processing background noise or partial utterances.
引用来源 (3)
- LiveKit Agents GitHub— LiveKit Agents is a framework for real-time voice AI agents
- LiveKit Agents Documentation— Supports OpenAI, Anthropic, Deepgram, ElevenLabs providers
- LiveKit GitHub— WebRTC-based real-time communication infrastructure
来源与感谢
Created by LiveKit. Licensed under Apache 2.0. livekit/agents — 9,900+ GitHub stars
讨论
相关资产
Moshi — Real-Time AI Voice Conversation Engine
Open-source real-time voice AI by Kyutai. Full-duplex speech conversation with 200ms latency, emotion recognition, and on-device processing. Apache 2.0 licensed.
LiveKit Agents — Python Framework for Voice AI
LiveKit Agents is a Python framework for real-time voice AI. Pluggable STT/LLM/TTS, VAD, barge-in. Run on LiveKit Cloud or self-host.
RethinkDB — The Real-Time Document Database
RethinkDB is an open-source document database that pushes query results to your application in real time. Build live dashboards and collaborative apps without polling.
Apache Doris — Modern MPP Analytical Database for Real-Time Reporting
Apache Doris is a high-performance real-time analytical database. It combines MySQL-compatible SQL, sub-second query latency, and support for federated queries across data lakes, Hive, Iceberg, and Hudi — the open-source answer to Snowflake and BigQuery.