AI Gateway

Kong AI Gateway — Enterprise-grade LLM Proxy

Kong AI Gateway adds LLM-specific plugins (prompt transforms, semantic caching, cost limits, guardrails) to the Kong API gateway — ideal for teams already running Kong who want AI controls on the same plane.

Official Site GitHub

Why Kong AI Gateway

Kong has been the default enterprise API gateway for a decade — battle-tested policy engine, plugin ecosystem, and Kubernetes-native deployment. Kong AI Gateway adds LLM-aware plugins on top: ai-proxy for provider abstraction, ai-prompt-template/ai-prompt-decorator for prompt control, ai-rate-limiting for token-based throttling, ai-semantic-cache for embedding-based caching, and ai-prompt-guard for input validation.

The value proposition is one control plane. Security, routing, rate-limits, and LLM policies all enforced by the same gateway your platform team already operates. For large enterprises, adding a purpose-built tool like Portkey next to Kong creates parallel stacks and governance overhead — Kong AI Gateway folds LLM concerns into the existing one.

Where it’s overkill: startups and small teams that don’t already run Kong. The ops burden of Kong (control plane, data plane, DB, plugin configuration) is real. For greenfield LLM apps, Cloudflare, Portkey, or LiteLLM ship faster.

Quick Start — Kong + ai-proxy Plugin

This config declares a Kong route, tells the ai-proxy plugin to forward to Anthropic Claude, attaches semantic cache against a Redis vector store, and sets a per-minute token rate limit. The client hits Kong with an OpenAI-shape payload; Kong handles translation, caching, and throttling transparently.

# declarative Kong config (kong.yml) — exposes /ai-gateway/chat as an
# OpenAI-compatible endpoint backed by Anthropic Claude with semantic caching.

_format_version: "3.0"
services:
  - name: ai
    url: https://localhost   # dummy; ai-proxy handles real upstream
    routes:
      - name: chat
        paths: ["/ai-gateway/chat"]
        plugins:
          - name: ai-proxy
            config:
              route_type: "llm/v1/chat"
              auth:
                header_name: "x-api-key"
                header_value: "$ANTHROPIC_API_KEY"
              model:
                provider: "anthropic"
                name: "claude-3-5-sonnet-20241022"
              logging:
                log_statistics: true
                log_payloads: true
          - name: ai-semantic-cache
            config:
              embeddings:
                auth: { header_name: "Authorization", header_value: "Bearer $OPENAI_KEY" }
                model: { name: "text-embedding-3-small" }
              vectordb:
                dimensions: 1536
                strategy: "redis"
                threshold: 0.08
          - name: ai-rate-limiting
            config:
              llm_providers:
                - name: anthropic
                  limit: [200000]    # tokens / minute
                  window_size: [60]

# Client calls Kong instead of Anthropic directly — gets caching + rate limits
# curl -XPOST http://kong/ai-gateway/chat -d '{"messages":[{"role":"user","content":"hi"}]}'

Key Features

ai-proxy plugin

Normalizes requests to OpenAI chat completions, LLM/v1 completions, or LLM/v1 embeddings shape. Routes to OpenAI, Anthropic, Azure, Cohere, Gemini, Mistral, HuggingFace, or any OpenAI-compatible backend.

Semantic + exact cache

ai-semantic-cache plugin embeds prompts and matches against recent cached entries. Threshold-configurable. Uses Redis, Postgres/pgvector, or external vector DBs.

Token-aware rate limiting

ai-rate-limiting counts actual tokens consumed (input + output) against configured budgets. More accurate than request-count limits for preventing runaway spend.

Prompt templates and guards

ai-prompt-template injects variables; ai-prompt-decorator prepends system messages; ai-prompt-guard blocks requests matching configured patterns (prompt injection defense).

Kong policy integration

All standard Kong plugins apply: mTLS, OAuth2/OIDC, IP allowlists, request transformers, CORS, rate-limit-advanced. LLM routes get the same security as REST APIs.

Kong Manager UI + Konnect SaaS

Ops teams manage AI routes alongside existing APIs in the same Kong Manager UI. Konnect SaaS option for hosted control plane.

Comparison

	Target Audience	Integration Depth	License	Best For
Kong AI Gatewaythis	Enterprises on Kong	Plugin on Kong core	Kong OSS + Enterprise	Existing Kong shops
Portkey	All sizes	Standalone	OSS gateway + paid cloud	Managed convenience
LiteLLM	All sizes	Standalone	MIT	OSS gateway + unified SDK
Cloudflare AI Gateway	Small/mid teams	Managed only	Proprietary	Edge-first simplicity

Use Cases

01. Large enterprise platforms

Platform teams already operate Kong for REST APIs. Extending to AI routes keeps governance, audit, and ops consolidated. LLM policies live next to existing API policies.

02. Regulated industries

Kong Enterprise ships with the compliance certifications (SOC 2, ISO 27001, PCI) that many regulated orgs require. Adding AI in-house without expanding vendor perimeter is worth the ops load.

03. Internal AI gateway with strict SLAs

Kong’s data plane handles millions of requests per second in production. AI plugins inherit that performance baseline — overkill for 10 RPS, meaningful at 10K RPS.

Pricing & License

Kong OSS: Apache 2.0. Includes ai-proxy, ai-prompt-template, ai-prompt-decorator, ai-prompt-guard, ai-rate-limiting, ai-semantic-cache. Free self-host. Full ops burden on you.

Kong Gateway Enterprise: commercial license. Adds Kong Manager UI, RBAC, dev portal, vault, advanced plugins, enterprise support. Priced by volume and nodes — contact Kong sales.

Konnect (SaaS): managed control plane. Pairs with self-hosted data plane for hybrid model. Usage-based pricing.

Related Assets on TokRepo

Kong — Cloud-Native API and AI Gateway

Kong Gateway is a scalable, open-source API gateway and microservice proxy built on top of NGINX with pluggable policy enforcement for authentication, rate limiting, observability, and AI traffic.

Frequently Asked Questions

Do I need Kong Enterprise to use AI plugins?+

No. The core ai-proxy, ai-prompt-template, ai-prompt-decorator, ai-prompt-guard, ai-rate-limiting, and ai-semantic-cache plugins ship in OSS Kong. Enterprise adds the Manager UI, enhanced plugins, and commercial support.

Kong vs Portkey for an enterprise?+

Kong if you already run Kong — adding AI there is less vendor sprawl. Portkey if you want a purpose-built LLM control plane with a polished UI for non-infrastructure teams (product managers using the prompt registry, for example). Some teams run both: Kong at the edge, Portkey for prompt-level workflow.

Can Kong AI Gateway do observability?+

Basic — logging plugins capture request/response and latency. For deep LLM observability (traces, evals, datasets), pair with Langfuse or Helicone. Kong handles the data plane; observability tools handle analysis.

How does ai-semantic-cache compare to Portkey caching?+

Both embed prompts and match by similarity. Kong integrates with your existing Redis/Postgres infra; Portkey manages storage for you. Performance is similar — difference is operational surface area.

What if I don’t run Kong today?+

For green-field LLM-only workloads, Kong AI Gateway is usually too heavy. Use LiteLLM, Portkey, or Cloudflare instead. Revisit Kong when your org adopts it at the REST API layer and you want LLM policies on the same plane.

Compare Alternatives

Cloudflare AI Gateway — Edge Proxy for LLM Traffic Portkey — AI Gateway with Prompt Management & Observability LiteLLM — Open-source LLM Proxy for 100+ Providers Helicone — Zero-Code LLM Observability Platform