AI Gateway

Cloudflare AI Gateway — LLM 流量的边缘代理

Cloudflare AI Gateway 是免费的边缘代理，夹在你的应用和 LLM 服务商之间——缓存响应、限流防滥用、模型故障切换，并输出分析数据，不改动 SDK 代码。

Why Cloudflare AI Gateway

The cheapest "I need production-grade LLM infrastructure right now" answer. Cloudflare AI Gateway is free at the Workers free tier, deploys in minutes, and supports OpenAI, Anthropic, Gemini, Groq, Mistral, Workers AI, and a dozen other providers without SDK changes — just change the base URL.

The trade-off is opinionated simplicity. You get caching, rate-limiting, retry/fallback, and a dashboard with request logs and spend tracking. You don’t get Portkey’s prompt management, LiteLLM’s extensive routing rules, or Langfuse-depth traces. For a startup shipping its first LLM feature, that trade is almost always correct.

The Cloudflare edge network is the hidden benefit. Because the gateway runs at 300+ POPs, LLM requests hit a nearby Cloudflare edge first, then Cloudflare reaches out to the provider from a warm connection. On cache hits (a surprising fraction of real traffic) you return in milliseconds without hitting the provider at all.

Quick Start — Switch Base URL, Nothing Else

The only change is base_url. The gateway supports OpenAI, Anthropic, Gemini, Workers AI, Groq, Mistral, Perplexity, HuggingFace, Replicate, Cohere, Azure, AWS Bedrock, and Vertex AI — each under its own path segment. Caching, retries, and fallbacks are configured in the dashboard, not in code.

# 1. In Cloudflare dashboard: AI → AI Gateway → Create gateway
#    → You get a base URL like https://gateway.ai.cloudflare.com/v1/<account>/<gateway>
#
# 2. Point your SDK at it. Everything else stays the same.

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/openai",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize the AI gateway category."}],
    # Cache identical requests for 1 hour
    extra_headers={"cf-aig-cache-ttl": "3600"},
)
print(resp.choices[0].message.content)

# Anthropic? Same gateway, different path segment:
# base_url="https://gateway.ai.cloudflare.com/v1/<account>/<gateway>/anthropic"
# Dashboard now shows logs, cache hits, per-provider spend, and failure rates.

核心能力

Drop-in base URL

No SDK change, no new client library. Your existing OpenAI/Anthropic code keeps working after you change the base URL. Zero risk migration.

Semantic + exact cache

Identical requests are cached by default. Semantic cache (paid) matches near-duplicate prompts via embeddings — typical ~20-40% hit rate on real traffic.

Per-provider fallback

Configure automatic failover: try Anthropic first, fall back to OpenAI on timeout or 5xx. Reduces incident impact without client-side code.

Rate limits by user / route

Cap request volume per custom identifier (user ID, API key). Useful for free-tier products and abuse prevention. Configurable per gateway.

Request logs & spend dashboard

Every request logged with prompt, response, latency, cost. Filter by model, status, custom tags. Adequate for ops; not a replacement for Langfuse-depth tracing.

Edge network performance

Gateway runs at 300+ POPs. Cache hits return in ~10ms regardless of provider region. Even misses benefit from Cloudflare’s warm upstream connections.

对比

	Deployment	Cost	Prompt Mgmt	Observability Depth
Cloudflare AI Gateway本工具	Managed edge	Free tier + pay-as-you-go	No	Basic (logs, spend)
Portkey	Managed + self-host	Paid plans	Yes (versioning + A/B)	Medium
LiteLLM Proxy	Self-host	Free (OSS)	Partial	Integrates with Langfuse
Kong AI Gateway	Self-host enterprise	Kong license	Via Kong plugins	Via Kong ecosystem

实际用例

01. Early-stage startups

First LLM feature shipped. Cloudflare gateway adds caching, failover, and cost visibility in an afternoon — before you need a dedicated observability stack.

02. High-traffic consumer apps

When a significant fraction of prompts are near-duplicates (chatbots, search suggestions), Cloudflare’s edge cache saves both latency and LLM spend.

03. Teams already on Cloudflare

Workers, Pages, D1, R2 users get native integration. AI Gateway fits into the existing Cloudflare account, bindings, and observability — no new vendor.

价格与许可

Free tier: the first 100K logged requests per month are free. Unlogged requests (pure passthrough) have no hard cap but may be rate-limited under extreme load.

Paid tier: usage-based beyond the free tier. Semantic caching and extended log retention are paid add-ons. Current pricing on Cloudflare docs.

Hidden savings: the most impactful "cost" of this product is negative — cache hits reduce your LLM bill directly. A startup chat app paying $2K/month on OpenAI can cut that 15-30% by enabling aggressive caching on repeated prompts.

常见问题

Is Cloudflare AI Gateway really free?+

The free tier covers 100K logged requests per month, which is enough for most small-to-mid apps. Beyond that, pricing is usage-based. Unlogged passthrough is uncapped but unmonitored — most teams log everything.

Does it support Anthropic Claude?+

Yes. Supported providers in 2026 include OpenAI, Anthropic, Google Gemini, Groq, Mistral, Workers AI, Cohere, HuggingFace, Replicate, Perplexity, Azure OpenAI, AWS Bedrock, and Vertex AI. Each sits under its own path segment of the gateway URL.

How does semantic caching work?+

Instead of exact-match caching, semantic cache embeds the incoming prompt and matches against embeddings of recent prompts. When a close enough match is found (configurable threshold), the cached response is returned. Typical hit rates: 20-40% on high-repetition workloads. Embedding cost is small relative to skipped LLM calls.

Is this a full observability platform?+

No — it’s a gateway with basic observability. For deeper tracing (tool calls, chains, spans), pair Cloudflare AI Gateway with Langfuse or Helicone. Cloudflare handles ingress and caching; Langfuse handles structured traces and evals.

Can I run a self-hosted version?+

No. Cloudflare AI Gateway is a managed product. For self-hosted alternatives, look at LiteLLM Proxy or Kong AI Gateway. Many teams run both — Cloudflare at the edge for global caching, LiteLLM for internal routing policies.

Does it work with the Vercel AI SDK or LangChain?+

Yes. Both libraries accept a custom baseURL for OpenAI-compatible providers. Point them at your Cloudflare gateway URL and the rest works unchanged.