OpenAI Realtime Agents — Voice AI Agent Architecture
Two Agent Patterns
1. Chat-Supervisor Pattern
A realtime voice agent handles user conversation and basic tasks, while a more intelligent text-based supervisor model (GPT-4.1) handles complex tool calls and decision-making.
User ←→ [Voice Agent (realtime)] ←→ [Supervisor (GPT-4.1)]
↓ ↓
Basic tasks Complex reasoning
Voice I/O Tool callsWhen to use: When you need natural voice interaction but also complex reasoning that requires a more capable model.
2. Sequential Handoff Pattern
Specialized agents transfer users between them based on detected intent. Inspired by the OpenAI Swarm pattern.
User → [Greeter Agent] → [Sales Agent] → [Support Agent]
↓ ↓ ↓
Route intent Handle sales Handle supportWhen to use: When different conversation stages require different expertise (e.g., routing → sales → support).
Key Technologies
| Technology | Purpose |
|---|---|
| OpenAI Realtime API | Low-latency voice streaming |
| OpenAI Agents SDK | Multi-agent orchestration |
| WebRTC | Browser-based voice I/O |
| GPT-4.1 | Text-based supervisor reasoning |
Setup
git clone https://github.com/openai/openai-realtime-agents.git
cd openai-realtime-agents
npm i
export OPENAI_API_KEY=sk-your-key
npm run devFAQ
Q: What is OpenAI Realtime Agents? A: An official OpenAI demo showing how to build sophisticated voice AI agents using the Realtime API and Agents SDK, with patterns like chat-supervisor hierarchies and sequential handoffs.
Q: Is it free to use? A: The code is MIT licensed. You pay for OpenAI API usage (Realtime API pricing applies).
Q: Can I use this in production? A: It's a demo/reference implementation. Use the patterns and architecture in your own production applications.