Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 7, 2026·4 min de lectura

Vapi — Voice AI Agent Platform with STT, LLM & TTS

Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 1cea9022-eb14-4c9d-ae40-dbf5948c9139
Introducción

Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).


Create your first voice agent

curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'

Make an outbound call

curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'

Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.

Add custom tools

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}

When the LLM decides to call check_availability, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.

Why use Vapi vs roll-your-own

Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.


FAQ

Q: Is Vapi free? A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.

Q: Can I use Claude instead of GPT-4o? A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the model.provider field.

Q: How fast is the turn-taking? A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.


Quick Use

  1. Sign up at vapi.ai — copy your API key
  2. Buy a Twilio phone number and link it in Vapi → Phone Numbers
  3. POST the assistant JSON below to /assistant, then /call/phone to dial out

Intro

Vapi is the voice AI agent platform — STT (Deepgram / Whisper), LLM (GPT-4o / Claude / custom), TTS (ElevenLabs / Cartesia / PlayHT), and turn-taking glue exposed through one API. Spin up an outbound or inbound phone agent in 5 minutes. Best for: founders building voice products without composing 5 vendor SDKs themselves. Works with: Twilio numbers, Vonage, custom SIP. Setup time: 5 minutes (sign up + a phone number).


Create your first voice agent

curl -X POST https://api.vapi.ai/assistant \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Acme Concierge",
    "firstMessage": "Hi, this is Acme. How can I help you today?",
    "model": {
      "provider": "openai",
      "model": "gpt-4o",
      "messages": [
        {
          "role": "system",
          "content": "You are a concierge for Acme Hotels. Greet the caller, ask how you can help, and stay friendly. If they want to book, ask for dates and party size. Speak naturally and concisely."
        }
      ]
    },
    "voice": {
      "provider": "11labs",
      "voiceId": "21m00Tcm4TlvDq8ikWAM"
    },
    "transcriber": {
      "provider": "deepgram",
      "model": "nova-2"
    }
  }'

Make an outbound call

curl -X POST https://api.vapi.ai/call/phone \
  -H "Authorization: Bearer $VAPI_API_KEY" \
  -d '{
    "phoneNumberId": "your-twilio-number-id",
    "customer": { "number": "+15551234567" },
    "assistantId": "assistant-id-from-step-1"
  }'

Vapi calls the customer, plays the firstMessage, transcribes their speech in real-time, sends to GPT-4o, streams the response through ElevenLabs back to the caller. Sub-second turn-taking.

Add custom tools

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "check_availability",
        "description": "Check room availability for given dates and party size",
        "parameters": {
          "type": "object",
          "properties": {
            "checkin": { "type": "string", "format": "date" },
            "checkout": { "type": "string", "format": "date" },
            "guests": { "type": "integer" }
          },
          "required": ["checkin", "checkout", "guests"]
        }
      },
      "server": {
        "url": "https://your-backend.example.com/check-availability"
      }
    }
  ]
}

When the LLM decides to call check_availability, Vapi POSTs to your backend, gets the result, and the LLM continues the call with the data.

Why use Vapi vs roll-your-own

Building this stack manually means: Twilio Media Streams + Deepgram WebSocket + your LLM + ElevenLabs streaming WebSocket + a state machine for VAD + barge-in. Vapi packages it. The trade-off: vendor lock-in on the audio pipeline.


FAQ

Q: Is Vapi free? A: Vapi has a free trial with included minutes. After that it's pay-per-minute (~$0.05-0.20/min depending on the STT/LLM/TTS combo). You can also bring your own provider keys to avoid Vapi's markup. Pricing on vapi.ai/pricing.

Q: Can I use Claude instead of GPT-4o? A: Yes — Vapi supports OpenAI, Anthropic, Google, Groq, Together, Together's Llama, and custom OpenAI-compatible endpoints (so you can plug in Codestral via Mistral or LiteLLM proxy). Switch via the model.provider field.

Q: How fast is the turn-taking? A: Vapi targets ~500-800ms first-byte latency end-to-end. The biggest variable is the LLM — GPT-4o-mini is fastest, Claude Sonnet is highest quality. With OpenAI Realtime as the model, latency drops to ~300-400ms.


Source & Thanks

Built by Vapi. Commercial product with free trial.

vapi.ai — API documentation

🙏

Fuente y agradecimientos

Built by Vapi. Commercial product with free trial.

vapi.ai — API documentation

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados