OpenAI Realtime Agents — Voice AI Agent Patterns
Advanced agentic patterns for voice AI built on OpenAI Realtime API. Chat-supervisor and sequential handoff patterns with WebRTC streaming. MIT, 6,800+ stars.
What it is
OpenAI Realtime Agents is an official OpenAI demo showcasing advanced agentic patterns for voice AI. It demonstrates two key patterns: Chat-Supervisor (a realtime voice agent delegates complex tasks to a smarter text model like GPT-4.1) and Sequential Handoff (specialized agents transfer users between each other based on intent). Built with the OpenAI Agents SDK and WebRTC voice streaming.
It is designed for developers building voice-enabled AI applications, customer service bots, or multi-agent voice systems.
How it saves time or tokens
The chat-supervisor pattern reduces token cost by using a fast, lightweight voice model for conversation flow while delegating expensive reasoning to a text model only when needed. This avoids routing every utterance through a large model. The sequential handoff pattern prevents prompt bloat by keeping each agent focused on a single domain.
How to use
- Clone the repository:
git clone https://github.com/openai/openai-realtime-agents.git - Install dependencies:
cd openai-realtime-agents && npm i - Set your API key:
export OPENAI_API_KEY=sk-your-key-here - Run the demo:
npm run devand openhttp://localhost:3000
Example
Chat-Supervisor Pattern:
User (voice) <-> [Realtime Voice Agent] <-> [Supervisor GPT-4.1]
| |
handles chat handles complex
and simple tasks tool calls and
decision-making
Sequential Handoff Pattern:
User -> [Greeter Agent] -> [Sales Agent] -> [Support Agent]
| | |
detects intent handles sales handles support
and routes queries queries
Related on TokRepo
- Multi-Agent Frameworks -- Compare agent orchestration frameworks
- AI Tools for Agents -- Tools for building AI agent systems
Common pitfalls
- WebRTC requires HTTPS in production; localhost works for development but deployment needs proper TLS certificates
- The realtime API has different pricing from the standard chat API; monitor usage carefully during development
- Audio quality depends on network conditions; implement proper error handling for dropped connections
Frequently Asked Questions
The chat-supervisor pattern uses a lightweight realtime voice agent for conversation flow and delegates complex reasoning or tool calls to a more capable text model like GPT-4.1. This balances responsiveness with intelligence while controlling costs.
Specialized agents handle different domains. When a user's intent changes, the current agent hands off the conversation to the next appropriate agent. Each agent maintains focused context for its domain without carrying unnecessary conversation history.
The demo uses OpenAI's realtime API models for voice interaction and GPT-4.1 as the supervisor text model. The Agents SDK orchestrates the handoffs and tool calls between models.
The demo is a reference implementation, not a production-ready service. You would need to add authentication, error handling, scaling infrastructure, and monitoring before deploying to production.
The demo is tightly coupled to OpenAI's Realtime API and Agents SDK. Adapting it to other providers would require replacing the WebRTC voice streaming layer and the agent orchestration SDK.
Citations (3)
- OpenAI Realtime Agents GitHub— Official OpenAI demo for realtime voice agents
- OpenAI Agents SDK— Built with OpenAI Agents SDK
- OpenAI Realtime API Docs— WebRTC streaming for real-time voice interaction
Related on TokRepo
Source & Thanks
Created by OpenAI. Licensed under MIT.
openai-realtime-agents — ⭐ 6,800+
Thanks to Noah MacCallum, Ilan Bigio, and the OpenAI team for demonstrating production voice AI patterns.
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.