Kong AI Gateway — Enterprise-grade LLM Proxy
Kong AI Gateway adds LLM-specific plugins (prompt transforms, semantic caching, cost limits, guardrails) to the Kong API gateway — ideal for teams already running Kong who want AI controls on the same plane.
Why Kong AI Gateway
Kong has been the default enterprise API gateway for a decade — battle-tested policy engine, plugin ecosystem, and Kubernetes-native deployment. Kong AI Gateway adds LLM-aware plugins on top: ai-proxy for provider abstraction, ai-prompt-template/ai-prompt-decorator for prompt control, ai-rate-limiting for token-based throttling, ai-semantic-cache for embedding-based caching, and ai-prompt-guard for input validation.
The value proposition is one control plane. Security, routing, rate-limits, and LLM policies all enforced by the same gateway your platform team already operates. For large enterprises, adding a purpose-built tool like Portkey next to Kong creates parallel stacks and governance overhead — Kong AI Gateway folds LLM concerns into the existing one.
Where it’s overkill: startups and small teams that don’t already run Kong. The ops burden of Kong (control plane, data plane, DB, plugin configuration) is real. For greenfield LLM apps, Cloudflare, Portkey, or LiteLLM ship faster.
Quick Start — Kong + ai-proxy Plugin
This config declares a Kong route, tells the ai-proxy plugin to forward to Anthropic Claude, attaches semantic cache against a Redis vector store, and sets a per-minute token rate limit. The client hits Kong with an OpenAI-shape payload; Kong handles translation, caching, and throttling transparently.
# declarative Kong config (kong.yml) — exposes /ai-gateway/chat as an
# OpenAI-compatible endpoint backed by Anthropic Claude with semantic caching.
_format_version: "3.0"
services:
- name: ai
url: https://localhost # dummy; ai-proxy handles real upstream
routes:
- name: chat
paths: ["/ai-gateway/chat"]
plugins:
- name: ai-proxy
config:
route_type: "llm/v1/chat"
auth:
header_name: "x-api-key"
header_value: "$ANTHROPIC_API_KEY"
model:
provider: "anthropic"
name: "claude-3-5-sonnet-20241022"
logging:
log_statistics: true
log_payloads: true
- name: ai-semantic-cache
config:
embeddings:
auth: { header_name: "Authorization", header_value: "Bearer $OPENAI_KEY" }
model: { name: "text-embedding-3-small" }
vectordb:
dimensions: 1536
strategy: "redis"
threshold: 0.08
- name: ai-rate-limiting
config:
llm_providers:
- name: anthropic
limit: [200000] # tokens / minute
window_size: [60]
# Client calls Kong instead of Anthropic directly — gets caching + rate limits
# curl -XPOST http://kong/ai-gateway/chat -d '{"messages":[{"role":"user","content":"hi"}]}'Key Features
ai-proxy plugin
Normalizes requests to OpenAI chat completions, LLM/v1 completions, or LLM/v1 embeddings shape. Routes to OpenAI, Anthropic, Azure, Cohere, Gemini, Mistral, HuggingFace, or any OpenAI-compatible backend.
Semantic + exact cache
ai-semantic-cache plugin embeds prompts and matches against recent cached entries. Threshold-configurable. Uses Redis, Postgres/pgvector, or external vector DBs.
Token-aware rate limiting
ai-rate-limiting counts actual tokens consumed (input + output) against configured budgets. More accurate than request-count limits for preventing runaway spend.
Prompt templates and guards
ai-prompt-template injects variables; ai-prompt-decorator prepends system messages; ai-prompt-guard blocks requests matching configured patterns (prompt injection defense).
Kong policy integration
All standard Kong plugins apply: mTLS, OAuth2/OIDC, IP allowlists, request transformers, CORS, rate-limit-advanced. LLM routes get the same security as REST APIs.
Kong Manager UI + Konnect SaaS
Ops teams manage AI routes alongside existing APIs in the same Kong Manager UI. Konnect SaaS option for hosted control plane.
Comparison
| Target Audience | Integration Depth | License | Best For | |
|---|---|---|---|---|
| Kong AI Gatewaythis | Enterprises on Kong | Plugin on Kong core | Kong OSS + Enterprise | Existing Kong shops |
| Portkey | All sizes | Standalone | OSS gateway + paid cloud | Managed convenience |
| LiteLLM | All sizes | Standalone | MIT | OSS gateway + unified SDK |
| Cloudflare AI Gateway | Small/mid teams | Managed only | Proprietary | Edge-first simplicity |
Use Cases
01. Large enterprise platforms
Platform teams already operate Kong for REST APIs. Extending to AI routes keeps governance, audit, and ops consolidated. LLM policies live next to existing API policies.
02. Regulated industries
Kong Enterprise ships with the compliance certifications (SOC 2, ISO 27001, PCI) that many regulated orgs require. Adding AI in-house without expanding vendor perimeter is worth the ops load.
03. Internal AI gateway with strict SLAs
Kong’s data plane handles millions of requests per second in production. AI plugins inherit that performance baseline — overkill for 10 RPS, meaningful at 10K RPS.
Pricing & License
Kong OSS: Apache 2.0. Includes ai-proxy, ai-prompt-template, ai-prompt-decorator, ai-prompt-guard, ai-rate-limiting, ai-semantic-cache. Free self-host. Full ops burden on you.
Kong Gateway Enterprise: commercial license. Adds Kong Manager UI, RBAC, dev portal, vault, advanced plugins, enterprise support. Priced by volume and nodes — contact Kong sales.
Konnect (SaaS): managed control plane. Pairs with self-hosted data plane for hybrid model. Usage-based pricing.
Frequently Asked Questions
Do I need Kong Enterprise to use AI plugins?+
No. The core ai-proxy, ai-prompt-template, ai-prompt-decorator, ai-prompt-guard, ai-rate-limiting, and ai-semantic-cache plugins ship in OSS Kong. Enterprise adds the Manager UI, enhanced plugins, and commercial support.
Kong vs Portkey for an enterprise?+
Kong if you already run Kong — adding AI there is less vendor sprawl. Portkey if you want a purpose-built LLM control plane with a polished UI for non-infrastructure teams (product managers using the prompt registry, for example). Some teams run both: Kong at the edge, Portkey for prompt-level workflow.
Can Kong AI Gateway do observability?+
Basic — logging plugins capture request/response and latency. For deep LLM observability (traces, evals, datasets), pair with Langfuse or Helicone. Kong handles the data plane; observability tools handle analysis.
How does ai-semantic-cache compare to Portkey caching?+
Both embed prompts and match by similarity. Kong integrates with your existing Redis/Postgres infra; Portkey manages storage for you. Performance is similar — difference is operational surface area.
What if I don’t run Kong today?+
For green-field LLM-only workloads, Kong AI Gateway is usually too heavy. Use LiteLLM, Portkey, or Cloudflare instead. Revisit Kong when your org adopts it at the REST API layer and you want LLM policies on the same plane.