OpenAI Realtime Agents — Voice AI Agent Patterns
Advanced agentic patterns for voice AI built on OpenAI Realtime API. Chat-supervisor and sequential handoff patterns with WebRTC streaming. MIT, 6,800+ stars.
这个资产会安全暂存
这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。
npx -y tokrepo@latest install 0d228731-33e3-11f1-9bc6-00163e2b0d79 --target codex先暂存文件;激活前需要读取暂存 README 和安装计划。
What it is
OpenAI Realtime Agents is an official OpenAI demo showcasing advanced agentic patterns for voice AI. It demonstrates two key patterns: Chat-Supervisor (a realtime voice agent delegates complex tasks to a smarter text model like GPT-4.1) and Sequential Handoff (specialized agents transfer users between each other based on intent). Built with the OpenAI Agents SDK and WebRTC voice streaming.
It is designed for developers building voice-enabled AI applications, customer service bots, or multi-agent voice systems.
How it saves time or tokens
The chat-supervisor pattern reduces token cost by using a fast, lightweight voice model for conversation flow while delegating expensive reasoning to a text model only when needed. This avoids routing every utterance through a large model. The sequential handoff pattern prevents prompt bloat by keeping each agent focused on a single domain.
How to use
- Clone the repository:
git clone https://github.com/openai/openai-realtime-agents.git - Install dependencies:
cd openai-realtime-agents && npm i - Set your API key:
export OPENAI_API_KEY=sk-your-key-here - Run the demo:
npm run devand openhttp://localhost:3000
Example
Chat-Supervisor Pattern:
User (voice) <-> [Realtime Voice Agent] <-> [Supervisor GPT-4.1]
| |
handles chat handles complex
and simple tasks tool calls and
decision-making
Sequential Handoff Pattern:
User -> [Greeter Agent] -> [Sales Agent] -> [Support Agent]
| | |
detects intent handles sales handles support
and routes queries queries
Related on TokRepo
- Multi-Agent Frameworks -- Compare agent orchestration frameworks
- AI Tools for Agents -- Tools for building AI agent systems
Common pitfalls
- WebRTC requires HTTPS in production; localhost works for development but deployment needs proper TLS certificates
- The realtime API has different pricing from the standard chat API; monitor usage carefully during development
- Audio quality depends on network conditions; implement proper error handling for dropped connections
常见问题
The chat-supervisor pattern uses a lightweight realtime voice agent for conversation flow and delegates complex reasoning or tool calls to a more capable text model like GPT-4.1. This balances responsiveness with intelligence while controlling costs.
Specialized agents handle different domains. When a user's intent changes, the current agent hands off the conversation to the next appropriate agent. Each agent maintains focused context for its domain without carrying unnecessary conversation history.
The demo uses OpenAI's realtime API models for voice interaction and GPT-4.1 as the supervisor text model. The Agents SDK orchestrates the handoffs and tool calls between models.
The demo is a reference implementation, not a production-ready service. You would need to add authentication, error handling, scaling infrastructure, and monitoring before deploying to production.
The demo is tightly coupled to OpenAI's Realtime API and Agents SDK. Adapting it to other providers would require replacing the WebRTC voice streaming layer and the agent orchestration SDK.
引用来源 (3)
- OpenAI Realtime Agents GitHub— Official OpenAI demo for realtime voice agents
- OpenAI Agents SDK— Built with OpenAI Agents SDK
- OpenAI Realtime API Docs— WebRTC streaming for real-time voice interaction
来源与感谢
Created by OpenAI. Licensed under MIT.
openai-realtime-agents — ⭐ 6,800+
讨论
相关资产
OpenAI Agents SDK — Multi-Agent Workflows in Python
Official OpenAI framework for building multi-agent workflows. Handoffs between agents, tool calling, guardrails, tracing, and streaming. Lightweight, Python-native. 20K+ stars.
OpenAI Agents JS — TypeScript Multi-Agent SDK
OpenAI Agents JS brings multi-agent workflows to TypeScript with provider-agnostic runs, zod schemas, tracing hooks, and sandbox agent patterns.
Groq Whisper — Sub-Second Speech-to-Text for Voice Agents
Whisper-large-v3 on Groq runs 166× realtime — 60-sec clip in <400ms. OpenAI-compat audio.transcriptions endpoint for voice agents.
OpenAI Agents SDK — Build Multi-Agent Systems in Python
Official OpenAI Python SDK for building multi-agent systems with handoffs, guardrails, and tracing. Agents delegate to specialists, enforce safety rules, and produce observable traces. 8,000+ stars.