OpenRouter — Unified API for 300+ Models, One Invoice
OpenRouter is a managed router that exposes 300+ LLMs (OpenAI, Claude, Gemini, open-source via Groq/Together/Fireworks) behind a single OpenAI-compatible API and one consolidated bill.
Why OpenRouter
OpenRouter solves a specific pain: "I want to try 10 models this week without signing 10 vendor contracts". Top up credit once, access every major closed and open model behind one API key, switch models by changing a string. The pricing is pay-per-token — a small markup over direct provider prices in exchange for zero setup and unified billing.
It’s the fastest way to benchmark models on your workload. Prompt caching, streaming, tool calls, and vision all work uniformly. You can A/B test Claude 3.5 Sonnet against Gemini 2.0 Pro against Llama 3.3 70B in an afternoon.
Where it’s not the right answer: when you need direct vendor relationships (enterprise contracts, zero-retention SLAs, regional data residency) or when the per-token markup matters at your volume. At 10M+ tokens per month, going direct with LiteLLM proxying your own keys is often cheaper and gives you contractual leverage.
Quick Start — OpenAI SDK + Model String
HTTP-Referer and X-Title are optional but recommended — they make your app show up on the OpenRouter leaderboard (useful for attribution). Model names follow provider/model-slug. The "openrouter/auto" model leaves routing to OpenRouter’s cost optimizer.
# pip install openai
from openai import OpenAI
client = OpenAI(
api_key="sk-or-...",
base_url="https://openrouter.ai/api/v1",
default_headers={
"HTTP-Referer": "https://tokrepo.com",
"X-Title": "TokRepo AI Gateway Example",
},
)
# Switch models by changing the string — same code path
for model in [
"anthropic/claude-3.5-sonnet",
"openai/gpt-4o-mini",
"google/gemini-2.0-flash-001",
"meta-llama/llama-3.3-70b-instruct",
]:
resp = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Name one thing you do best."}],
max_tokens=80,
)
print(f"{model}: {resp.choices[0].message.content.strip()}")
print(f" cost: $ {resp.usage.total_tokens} tokens")
# Advanced: let OpenRouter pick cheapest available provider
# model="openrouter/auto" → auto-routes based on cost + availability.
# Or use OR-specific params for provider preferences and fallbacks.Key Features
300+ models behind one API
Closed models (OpenAI, Anthropic, Google, Cohere), major open models (Llama, Mistral, Qwen, DeepSeek), and specialty models (Perplexity online, vision models). One API key for everything.
Automatic provider fallback
OpenRouter keeps multiple upstream providers per open model (Groq, Together, Fireworks, Anyscale). If one is down or slow, it retries with another transparently.
Pay-per-token, no minimums
Top up credit, pay only for what you use. No monthly fees, no per-provider subscriptions. Cost visible per-request in response headers.
Provider preferences
Request-time headers to prefer specific providers, regions, or pricing tiers. Useful for compliance ("EU providers only") or performance ("prefer Groq").
Free tier models
A rotating set of free models (e.g., some smaller Llama and Gemma variants) for experimentation. Rate-limited but useful for prototyping.
App attribution
Apps can register with OpenRouter for leaderboards and default routing rules. Good distribution channel for public AI tools.
Comparison
| Type | Model Count | Billing | Self-host? | |
|---|---|---|---|---|
| OpenRouterthis | Managed router | 300+ | Unified (topup + per-token) | No |
| LiteLLM | Self-host proxy + SDK | 100+ providers | BYO keys per provider | Yes |
| Together AI | Hosted open-source inference | ~50 OSS models | Per-token | No |
| Groq | Specialty fast inference | ~20 OSS models | Per-token | No |
Use Cases
01. Model benchmarking
Run your actual prompts against a dozen models in an afternoon. Compare quality and cost before committing to a primary provider.
02. Fast prototyping
Side projects, weekend hacks, demos — one topup, every model available. Avoids the "I only want $5 of Claude" friction of direct vendor signup.
03. Apps that let users pick a model
Chatbots and AI wrappers that expose model choice to end users. OpenRouter is the cleanest way to offer 10+ options without 10+ integrations.
Pricing & License
Per-token pricing: direct upstream cost plus a small (typically 5-10%) markup. Exact rates per model shown at openrouter.ai/models. No monthly fees.
Free tier: limited free models (rate-limited, rotating list) for experimentation. Useful for dev/testing without spend.
At scale, compare against direct: for single-model high-volume workloads, direct provider relationships often beat OpenRouter’s markup. OpenRouter wins on flexibility and multi-model cost; direct wins on volume discounts and compliance.
Related Assets on TokRepo
LLM Gateway Comparison — Proxy Your AI Requests
Compare top LLM gateway and proxy tools for routing AI requests. Covers LiteLLM, Bifrost, Portkey, and OpenRouter for cost optimization, failover, and multi-provider access.
OpenRouter — Unified LLM API with Smart Routing
Single API endpoint for 200+ LLM models with automatic fallbacks, price comparison, and usage tracking. Route to the cheapest or fastest model that fits your needs. 3,000+ stars.
OpenRouter — Unified API for 200+ AI Models
Single API to access 200+ AI models from OpenAI, Anthropic, Google, Meta, Mistral, and more. OpenAI-compatible format, automatic fallbacks, and usage-based pricing.
LLM Gateway Comparison — LiteLLM vs OpenRouter vs CF
In-depth comparison of LLM API gateways: LiteLLM (self-hosted proxy), OpenRouter (unified API), and Cloudflare AI Gateway (edge cache). Architecture, pricing, and when to use each.
Frequently Asked Questions
OpenRouter vs LiteLLM?+
OpenRouter is a managed service (they hold keys, bill you, take a markup). LiteLLM is a self-hosted proxy (you hold keys, get direct-provider bills). OpenRouter for speed and flexibility; LiteLLM for control and compliance.
How much does OpenRouter add on top of provider prices?+
Typically 5-10% markup, model-dependent. Some open-source models cost less on OpenRouter than the advertised provider price due to OpenRouter’s volume agreements. Compare at openrouter.ai/models for each model’s current rate.
Does OpenRouter support tool calls / function calling?+
Yes — on models that support it (OpenAI, Claude, Gemini, many open models via their respective runtimes). The API mirrors OpenAI’s tool-calling shape.
Can I use OpenRouter with Claude Code / Cursor / Cline?+
Yes. These tools accept any OpenAI-compatible endpoint. Point them at https://openrouter.ai/api/v1 with your OpenRouter key and pick any supported model.
Is there a data retention concern?+
OpenRouter logs metadata (which model, tokens, latency) by default. Prompt/response content logging is opt-in per-request via headers. For fully zero-retention, check specific providers and enable the "OpenRouter ignore" header — or use LiteLLM with direct provider keys.