ConfigsApr 6, 2026·2 min read

Cloudflare AI Gateway — LLM Proxy, Cache & Analytics

Free proxy gateway for LLM API calls with caching, rate limiting, cost tracking, and fallback routing across providers. Reduce costs up to 95% with response caching. 7,000+ stars.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

  1. Sign up at dash.cloudflare.com (free tier available)
  2. Navigate to AI > AI Gateway > Create Gateway
  3. Replace your API base URL:
# Before
client = OpenAI(base_url="https://api.openai.com/v1")

# After — route through Cloudflare AI Gateway
client = OpenAI(
    base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_name}/openai"
)
# Same API key, same code — now with caching, logging, and analytics

Intro

Cloudflare AI Gateway is a free proxy that sits between your application and LLM providers, adding caching, rate limiting, cost analytics, and fallback routing with 7,000+ GitHub stars. Route API calls through the gateway without changing your code — just swap the base URL. Cached responses can reduce LLM costs by up to 95% for repeated queries. Best for teams running LLM applications in production who need cost control and observability. Works with: OpenAI, Anthropic, Google AI, Azure, Workers AI, HuggingFace. Setup time: under 5 minutes.


Key Features

Response Caching

Cache identical LLM requests to avoid paying twice:

First call: "Summarize this doc" → hits API → $0.03 → cached
Second call: same prompt → cache hit → $0.00 → <10ms

Configurable TTL from 1 minute to 30 days.

Cost Analytics

Real-time dashboard showing:

  • Total requests and tokens per model
  • Cost breakdown by provider
  • Cache hit rate
  • Error rate and latency percentiles

Rate Limiting

Protect your API budget:

Rules:
- Max 100 requests/minute per user
- Max $50/day total spend
- Alert at 80% budget threshold

Provider Fallbacks

Automatic failover between providers:

{
  "providers": ["openai", "anthropic", "azure"],
  "fallback": true,
  "retry": { "attempts": 3, "backoff": "exponential" }
}

If OpenAI is down, requests automatically route to Anthropic.

Logging & Debugging

Every request logged with full details:

  • Input/output tokens
  • Latency breakdown
  • Model used
  • Cache status
  • Error details

Supported Providers

Provider Endpoint Pattern
OpenAI /{gateway}/openai
Anthropic /{gateway}/anthropic
Google AI /{gateway}/google-ai-studio
Azure /{gateway}/azure-openai
HuggingFace /{gateway}/huggingface
Workers AI /{gateway}/workers-ai

Key Stats

  • 7,000+ GitHub stars
  • Free tier available
  • Up to 95% cost reduction with caching
  • 6+ provider integrations
  • Real-time analytics dashboard

FAQ

Q: What is Cloudflare AI Gateway? A: A free proxy gateway that adds caching, rate limiting, analytics, and fallback routing to LLM API calls without code changes — just swap the base URL.

Q: Is AI Gateway free? A: Yes, free tier includes 10,000 requests/day. Paid plans for higher volume.

Q: Does it add latency? A: Minimal — Cloudflare edge network adds <5ms. Cache hits are <10ms vs 500ms+ for API calls.


🙏

Source & Thanks

Created by Cloudflare. Licensed under Apache 2.0.

ai-gateway — ⭐ 7,000+

Thanks to Cloudflare for making LLM cost control accessible to every developer.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets