# Vapi — Voice AI Agent Platform with STT, LLM & TTS > Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack. ## Install Copy the content below into your project: ## Quick Use 1. Sign up at vapi.ai — copy your API key 2. Buy a Twilio phone number and link it in Vapi → Phone Numbers 3. POST the assistant JSON below to `/assistant`, then `/call/phone` to dial out --- ## Intro Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number). --- ### Create your first voice agent ```bash curl -X POST https://api.vapi.ai/assistant \ -H "Authorization: Bearer $VAPI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Acme Concierge", "firstMessage": "Hi, this is Acme. How can I help you today?", "model": { "provider": "openai", "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely." } ] }, "voice": { "provider": "11labs", "voiceId": "21m00Tcm4TlvDq8ikWAM" }, "transcriber": { "provider": "deepgram", "model": "nova-2" } }' ``` ### Make an outbound call ```bash curl -X POST https://api.vapi.ai/call/phone \ -H "Authorization: Bearer $VAPI_API_KEY" \ -d '{ "phoneNumberId": "your-twilio-number-id", "customer": { "number": "+15551234567" }, "assistantId": "assistant-id-from-step-1" }' ``` Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking. ### Add custom tools ```jsonc { "tools": [ { "type": "function", "function": { "name": "check_availability", "description": "Check room availability for given dates and party size", "parameters": { "type": "object", "properties": { "checkin": { "type": "string", "format": "date" }, "checkout": { "type": "string", "format": "date" }, "guests": { "type": "integer" } }, "required": ["checkin", "checkout", "guests"] } }, "server": { "url": "https://your-backend.example.com/check-availability" } } ] } ``` When the LLM decides to call `check_availability`, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data. ### Why use Vapi vs roll-your-own Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline. --- ### FAQ **Q: Is Vapi free?** A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing. **Q: Can I use Claude instead of GPT-4o?** A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the `model.provider` field. **Q: How fast is the turn-taking?** A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms. --- ## Source & Thanks > Built by [Vapi](https://github.com/VapiAI). Commercial product with free trial. > > [vapi.ai](https://vapi.ai) — API documentation --- ## 快速使用 1. 在 vapi.ai 注册,复制 API key 2. 买一个 Twilio 号码并在 Vapi → Phone Numbers 里关联 3. 把下面的 assistant JSON POST 到 `/assistant`,然后 `/call/phone` 外呼 --- ## 简介 Vapi 是语音 AI agent 平台 —— STT(Deepgram / Whisper)、LLM(GPT-4o / Claude / 自定义)、TTS(ElevenLabs / Cartesia / PlayHT)、轮次切换胶水都通过一个 API 露出。5 分钟起一个外呼或内呼电话 agent。适合不想自己拼 5 个供应商 SDK 的语音产品创业者。兼容 Twilio 号码、Vonage、自定义 SIP。装机时间 5 分钟(注册 + 一个电话号)。 --- ### 创建第一个语音 agent ```bash curl -X POST https://api.vapi.ai/assistant \ -H "Authorization: Bearer $VAPI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Acme Concierge", "firstMessage": "Hi, this is Acme. How can I help you today?", "model": { "provider": "openai", "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely." } ] }, "voice": { "provider": "11labs", "voiceId": "21m00Tcm4TlvDq8ikWAM" }, "transcriber": { "provider": "deepgram", "model": "nova-2" } }' ``` ### 发外呼电话 ```bash curl -X POST https://api.vapi.ai/call/phone \ -H "Authorization: Bearer $VAPI_API_KEY" \ -d '{ "phoneNumberId": "your-twilio-number-id", "customer": { "number": "+15551234567" }, "assistantId": "assistant-id-from-step-1" }' ``` Vapi 拨用户号码、播放 firstMessage、实时转录用户讲话、发给 GPT-4o、把响应流式推过 ElevenLabs 回播。亚秒级轮次切换。 ### 加自定义工具 ```jsonc { "tools": [ { "type": "function", "function": { "name": "check_availability", "description": "Check room availability for given dates and party size", "parameters": { "type": "object", "properties": { "checkin": { "type": "string", "format": "date" }, "checkout": { "type": "string", "format": "date" }, "guests": { "type": "integer" } }, "required": ["checkin", "checkout", "guests"] } }, "server": { "url": "https://your-backend.example.com/check-availability" } } ] } ``` LLM 决定调 `check_availability` 时,Vapi POST 到你的后端,拿到结果后 LLM 用结果继续通话。 ### 为啥用 Vapi 而不是自己拼 手动搭这套:Twilio Media Streams + Deepgram WebSocket + 你的 LLM + ElevenLabs 流式 WebSocket + VAD 状态机 + barge-in。Vapi 打包好了。代价:音频管道厂商锁定。 --- ### FAQ **Q: Vapi 免费吗?** A: Vapi 有免费试用包含一些通话分钟。之后按分钟付费(根据 STT/LLM/TTS 组合大约 $0.05-0.20/分钟)。也能用自己的 provider key 绕过 Vapi 加价。价格见 vapi.ai/pricing。 **Q: 能用 Claude 而不是 GPT-4o 吗?** A: 能 —— Vapi 支持 OpenAI / Anthropic / Google / Groq / Together / Together 的 Llama / 自定义 OpenAI 兼容端点(所以能通过 Mistral 或 LiteLLM proxy 接 Codestral)。切换 `model.provider` 字段就行。 **Q: 轮次切换多快?** A: Vapi 目标端到端首字节延迟约 500-800ms。最大变量是 LLM —— GPT-4o-mini 最快、Claude Sonnet 质量最高。模型用 OpenAI Realtime 延迟降到 300-400ms。 --- ## 来源与感谢 > Built by [Vapi](https://github.com/VapiAI). Commercial product with free trial. > > [vapi.ai](https://vapi.ai) — API documentation --- Source: https://tokrepo.com/en/workflows/vapi-voice-ai-agent-platform-with-stt-llm-tts Author: Vapi