WorkflowsMay 7, 2026·4 min read

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 29/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 1cea9022-eb14-4c9d-ae40-dbf5948c9139 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).


Create your first voice agent

curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'

Make an outbound call

curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'

Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.

Add custom tools

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}

When the LLM decides to call check_availability, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.

Why use Vapi vs roll-your-own

Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.


FAQ

Q: Is Vapi free? A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.

Q: Can I use Claude instead of GPT-4o? A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the model.provider field.

Q: How fast is the turn-taking? A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.


Quick Use

  1. Sign up at vapi.ai — copy your API key
  2. Buy a Twilio phone number and link it in Vapi → Phone Numbers
  3. POST the assistant JSON below to /assistant, then /call/phone to dial out

Intro

Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).


Create your first voice agent

curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'

Make an outbound call

curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'

Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.

Add custom tools

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}

When the LLM decides to call check_availability, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.

Why use Vapi vs roll-your-own

Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.


FAQ

Q: Is Vapi free? A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.

Q: Can I use Claude instead of GPT-4o? A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the model.provider field.

Q: How fast is the turn-taking? A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.


Source & Thanks

Built by Vapi. Commercial product with free trial.

vapi.ai — API documentation

🙏

Source & Thanks

Built by Vapi. Commercial product with free trial.

vapi.ai — API documentation

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets