Quick Use
- Sign up at vapi.ai — copy your API key
- Buy a Twilio phone number and link it in Vapi → Phone Numbers
- POST the assistant JSON below to
/assistant, then/call/phoneto dial out
Intro
Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).
Create your first voice agent
curl -X POST https://api.vapi.ai/assistant \
-H "Authorization: Bearer $VAPI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Concierge",
"firstMessage": "Hi, this is Acme. How can I help you today?",
"model": {
"provider": "openai",
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
}
]
},
"voice": {
"provider": "11labs",
"voiceId": "21m00Tcm4TlvDq8ikWAM"
},
"transcriber": {
"provider": "deepgram",
"model": "nova-2"
}
}'Make an outbound call
curl -X POST https://api.vapi.ai/call/phone \
-H "Authorization: Bearer $VAPI_API_KEY" \
-d '{
"phoneNumberId": "your-twilio-number-id",
"customer": { "number": "+15551234567" },
"assistantId": "assistant-id-from-step-1"
}'Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.
Add custom tools
{
"tools": [
{
"type": "function",
"function": {
"name": "check_availability",
"description": "Check room availability for given dates and party size",
"parameters": {
"type": "object",
"properties": {
"checkin": { "type": "string", "format": "date" },
"checkout": { "type": "string", "format": "date" },
"guests": { "type": "integer" }
},
"required": ["checkin", "checkout", "guests"]
}
},
"server": {
"url": "https://your-backend.example.com/check-availability"
}
}
]
}When the LLM decides to call check_availability, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.
Why use Vapi vs roll-your-own
Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.
FAQ
Q: Is Vapi free? A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.
Q: Can I use Claude instead of GPT-4o?
A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the model.provider field.
Q: How fast is the turn-taking? A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.
Source & Thanks
Built by Vapi. Commercial product with free trial.
vapi.ai — API documentation