Key Features
Response Caching
Cache identical LLM requests to avoid paying twice:
First call: "Summarize this doc" → hits API → $0.03 → cached
Second call: same prompt → cache hit → $0.00 → <10msConfigurable TTL from 1 minute to 30 days.
Cost Analytics
Real-time dashboard showing:
- Total requests and tokens per model
- Cost breakdown by provider
- Cache hit rate
- Error rate and latency percentiles
Rate Limiting
Protect your API budget:
Rules:
- Max 100 requests/minute per user
- Max $50/day total spend
- Alert at 80% budget thresholdProvider Fallbacks
Automatic failover between providers:
{
"providers": ["openai", "anthropic", "azure"],
"fallback": true,
"retry": { "attempts": 3, "backoff": "exponential" }
}If OpenAI is down, requests automatically route to Anthropic.
Logging & Debugging
Every request logged with full details:
- Input/output tokens
- Latency breakdown
- Model used
- Cache status
- Error details
Supported Providers
| Provider | Endpoint Pattern |
|---|---|
| OpenAI | /{gateway}/openai |
| Anthropic | /{gateway}/anthropic |
| Google AI | /{gateway}/google-ai-studio |
| Azure | /{gateway}/azure-openai |
| HuggingFace | /{gateway}/huggingface |
| Workers AI | /{gateway}/workers-ai |
Key Stats
- 7,000+ GitHub stars
- Free tier available
- Up to 95% cost reduction with caching
- 6+ provider integrations
- Real-time analytics dashboard
FAQ
Q: What is Cloudflare AI Gateway? A: A free proxy gateway that adds caching, rate limiting, analytics, and fallback routing to LLM API calls without code changes — just swap the base URL.
Q: Is AI Gateway free? A: Yes, free tier includes 10,000 requests/day. Paid plans for higher volume.
Q: Does it add latency? A: Minimal — Cloudflare edge network adds <5ms. Cache hits are <10ms vs 500ms+ for API calls.