What is Vapi — Voice AI Agent Platform with STT, LLM & TTS?

Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

Is Vapi — Voice AI Agent Platform with STT, LLM & TTS free to use?

Yes. Vapi — Voice AI Agent Platform with STT, LLM & TTS is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Vapi — Voice AI Agent Platform with STT, LLM & TTS?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Name: Vapi — Voice AI Agent Platform with STT, LLM & TTS
Author: Vapi

简介

Vapi 是语音 AI agent 平台 —— STT（Deepgram / Whisper）、LLM（GPT-4o / Claude / 自定义）、TTS（ElevenLabs / Cartesia / PlayHT）、轮次切换胶水都通过一个 API 露出。5 分钟起一个外呼或内呼电话 agent。适合不想自己拼 5 个供应商 SDK 的语音产品创业者。兼容 Twilio 号码、Vonage、自定义 SIP。装机时间 5 分钟（注册 + 一个电话号）。

创建第一个语音 agent

curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'

发外呼电话

curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'

Vapi 拨用户号码、播放 firstMessage、实时转录用户讲话、发给 GPT-4o、把响应流式推过 ElevenLabs 回播。亚秒级轮次切换。

加自定义工具

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}

LLM 决定调 check_availability 时，Vapi POST 到你的后端，拿到结果后 LLM 用结果继续通话。

为啥用 Vapi 而不是自己拼

手动搭这套：Twilio Media Streams + Deepgram WebSocket + 你的 LLM + ElevenLabs 流式 WebSocket + VAD 状态机 + barge-in。Vapi 打包好了。代价：音频管道厂商锁定。

FAQ

Q: Vapi 免费吗？ A: Vapi 有免费试用包含一些通话分钟。之后按分钟付费（根据 STT/LLM/TTS 组合大约 $0.05-0.20/分钟）。也能用自己的 provider key 绕过 Vapi 加价。价格见 vapi.ai/pricing。

Q: 能用 Claude 而不是 GPT-4o 吗？ A: 能 —— Vapi 支持 OpenAI / Anthropic / Google / Groq / Together / Together 的 Llama / 自定义 OpenAI 兼容端点（所以能通过 Mistral 或 LiteLLM proxy 接 Codestral）。切换 model.provider 字段就行。

Q: 轮次切换多快？ A: Vapi 目标端到端首字节延迟约 500-800ms。最大变量是 LLM —— GPT-4o-mini 最快、Claude Sonnet 质量最高。模型用 OpenAI Realtime 延迟降到 300-400ms。

Vapi — Voice AI Agent Platform with STT, LLM & TTS

这个资产可以被 Agent 直接读取和安装

简介

创建第一个语音 agent

发外呼电话

加自定义工具

为啥用 Vapi 而不是自己拼

FAQ

来源与感谢

讨论

相关资产

Vapi Squads — Multi-Agent Voice Routing in One Call

ElevenLabs ConvAI — Full-Duplex Voice Agent Platform

Deepgram Voice Agent API — Unified STT+LLM+TTS

Cartesia Streaming WebSocket — Full-Duplex Voice Agent TTS