# Vapi — Voice AI Agent Platform with STT, LLM & TTS

> Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

## Install

Copy the content below into your project:

## Quick Use

1. Sign up at vapi.ai — copy your API key
2. Buy a Twilio phone number and link it in Vapi → Phone Numbers
3. POST the assistant JSON below to `/assistant`, then `/call/phone` to dial out

---

## Intro

Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).

---

### Create your first voice agent

```bash
curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'
```

### Make an outbound call

```bash
curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'
```

Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.

### Add custom tools

```jsonc
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}
```

When the LLM decides to call `check_availability`, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.

### Why use Vapi vs roll-your-own

Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.

---

### FAQ

**Q: Is Vapi free?**
A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.

**Q: Can I use Claude instead of GPT-4o?**
A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the `model.provider` field.

**Q: How fast is the turn-taking?**
A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.

---

## Source & Thanks

> Built by [Vapi](https://github.com/VapiAI). Commercial product with free trial.
>
> [vapi.ai](https://vapi.ai) — API documentation

---

<!-- ZH -->

## 快速使用

1. 在 vapi.ai 注册，复制 API key
2. 买一个 Twilio 号码并在 Vapi → Phone Numbers 里关联
3. 把下面的 assistant JSON POST 到 `/assistant`，然后 `/call/phone` 外呼

---

## 简介

Vapi 是语音 AI agent 平台 —— STT（Deepgram / Whisper）、LLM（GPT-4o / Claude / 自定义）、TTS（ElevenLabs / Cartesia / PlayHT）、轮次切换胶水都通过一个 API 露出。5 分钟起一个外呼或内呼电话 agent。适合不想自己拼 5 个供应商 SDK 的语音产品创业者。兼容 Twilio 号码、Vonage、自定义 SIP。装机时间 5 分钟（注册 + 一个电话号）。

---

### 创建第一个语音 agent

```bash
curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'
```

### 发外呼电话

```bash
curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'
```

Vapi 拨用户号码、播放 firstMessage、实时转录用户讲话、发给 GPT-4o、把响应流式推过 ElevenLabs 回播。亚秒级轮次切换。

### 加自定义工具

```jsonc
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}
```

LLM 决定调 `check_availability` 时，Vapi POST 到你的后端，拿到结果后 LLM 用结果继续通话。

### 为啥用 Vapi 而不是自己拼

手动搭这套：Twilio Media Streams + Deepgram WebSocket + 你的 LLM + ElevenLabs 流式 WebSocket + VAD 状态机 + barge-in。Vapi 打包好了。代价：音频管道厂商锁定。

---

### FAQ

**Q: Vapi 免费吗？**
A: Vapi 有免费试用包含一些通话分钟。之后按分钟付费（根据 STT/LLM/TTS 组合大约 $0.05-0.20/分钟）。也能用自己的 provider key 绕过 Vapi 加价。价格见 vapi.ai/pricing。

**Q: 能用 Claude 而不是 GPT-4o 吗？**
A: 能 —— Vapi 支持 OpenAI / Anthropic / Google / Groq / Together / Together 的 Llama / 自定义 OpenAI 兼容端点（所以能通过 Mistral 或 LiteLLM proxy 接 Codestral）。切换 `model.provider` 字段就行。

**Q: 轮次切换多快？**
A: Vapi 目标端到端首字节延迟约 500-800ms。最大变量是 LLM —— GPT-4o-mini 最快、Claude Sonnet 质量最高。模型用 OpenAI Realtime 延迟降到 300-400ms。

---

## 来源与感谢

> Built by [Vapi](https://github.com/VapiAI). Commercial product with free trial.
>
> [vapi.ai](https://vapi.ai) — API documentation


---
Source: https://tokrepo.com/en/workflows/vapi-voice-ai-agent-platform-with-stt-llm-tts
Author: Vapi