AI Gateway

OpenRouter — Unified API for 300+ Models, One Invoice

OpenRouter is a managed router that exposes 300+ LLMs (OpenAI, Claude, Gemini, open-source via Groq/Together/Fireworks) behind a single OpenAI-compatible API and one consolidated bill.

Official Site

Why OpenRouter

OpenRouter solves a specific pain: "I want to try 10 models this week without signing 10 vendor contracts". Top up credit once, access every major closed and open model behind one API key, switch models by changing a string. The pricing is pay-per-token — a small markup over direct provider prices in exchange for zero setup and unified billing.

It’s the fastest way to benchmark models on your workload. Prompt caching, streaming, tool calls, and vision all work uniformly. You can A/B test Claude 3.5 Sonnet against Gemini 2.0 Pro against Llama 3.3 70B in an afternoon.

Where it’s not the right answer: when you need direct vendor relationships (enterprise contracts, zero-retention SLAs, regional data residency) or when the per-token markup matters at your volume. At 10M+ tokens per month, going direct with LiteLLM proxying your own keys is often cheaper and gives you contractual leverage.

Quick Start — OpenAI SDK + Model String

HTTP-Referer and X-Title are optional but recommended — they make your app show up on the OpenRouter leaderboard (useful for attribution). Model names follow provider/model-slug. The "openrouter/auto" model leaves routing to OpenRouter’s cost optimizer.

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
    default_headers={
        "HTTP-Referer": "https://tokrepo.com",
        "X-Title": "TokRepo AI Gateway Example",
    },
)

# Switch models by changing the string — same code path
for model in [
    "anthropic/claude-3.5-sonnet",
    "openai/gpt-4o-mini",
    "google/gemini-2.0-flash-001",
    "meta-llama/llama-3.3-70b-instruct",
]:
    resp = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Name one thing you do best."}],
        max_tokens=80,
    )
    print(f"{model}: {resp.choices[0].message.content.strip()}")
    print(f"  cost: $ {resp.usage.total_tokens} tokens")

# Advanced: let OpenRouter pick cheapest available provider
# model="openrouter/auto"  → auto-routes based on cost + availability.
# Or use OR-specific params for provider preferences and fallbacks.

Key Features

300+ models behind one API

Closed models (OpenAI, Anthropic, Google, Cohere), major open models (Llama, Mistral, Qwen, DeepSeek), and specialty models (Perplexity online, vision models). One API key for everything.

Automatic provider fallback

OpenRouter keeps multiple upstream providers per open model (Groq, Together, Fireworks, Anyscale). If one is down or slow, it retries with another transparently.

Pay-per-token, no minimums

Top up credit, pay only for what you use. No monthly fees, no per-provider subscriptions. Cost visible per-request in response headers.

Provider preferences

Request-time headers to prefer specific providers, regions, or pricing tiers. Useful for compliance ("EU providers only") or performance ("prefer Groq").

Free tier models

A rotating set of free models (e.g., some smaller Llama and Gemma variants) for experimentation. Rate-limited but useful for prototyping.

App attribution

Apps can register with OpenRouter for leaderboards and default routing rules. Good distribution channel for public AI tools.

Comparison

	Type	Model Count	Billing	Self-host?
OpenRouterthis	Managed router	300+	Unified (topup + per-token)	No
LiteLLM	Self-host proxy + SDK	100+ providers	BYO keys per provider	Yes
Together AI	Hosted open-source inference	~50 OSS models	Per-token	No
Groq	Specialty fast inference	~20 OSS models	Per-token	No

Use Cases

01. Model benchmarking

Run your actual prompts against a dozen models in an afternoon. Compare quality and cost before committing to a primary provider.

02. Fast prototyping

Side projects, weekend hacks, demos — one topup, every model available. Avoids the "I only want $5 of Claude" friction of direct vendor signup.

03. Apps that let users pick a model

Chatbots and AI wrappers that expose model choice to end users. OpenRouter is the cleanest way to offer 10+ options without 10+ integrations.

Pricing & License

Per-token pricing: direct upstream cost plus a small (typically 5-10%) markup. Exact rates per model shown at openrouter.ai/models. No monthly fees.

Free tier: limited free models (rate-limited, rotating list) for experimentation. Useful for dev/testing without spend.

At scale, compare against direct: for single-model high-volume workloads, direct provider relationships often beat OpenRouter’s markup. OpenRouter wins on flexibility and multi-model cost; direct wins on volume discounts and compliance.

Related Assets on TokRepo

OpenRouter — Unified API for 300+ LLMs with Auto Failover

OpenRouter is one OpenAI-compatible endpoint for 300+ LLMs across 60+ providers. Transparent pricing, no markup, automatic failover when a route is down.

OpenRouter MCP — One Server for 300+ LLMs in Claude Code

OpenRouter MCP exposes all 300+ OpenRouter models to Claude Code, Cursor, Codex CLI as one MCP server. Switch models per task, BYO routing, no extra SDKs.

OpenRouter Auto Routing — Pick the Best Model per Query

OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

OpenRouter — Unified API for 200+ AI Models

Single API to access 200+ AI models from OpenAI, Anthropic, Google, Meta, Mistral, and more. OpenAI-compatible format, automatic fallbacks, and usage-based pricing.

Frequently Asked Questions

OpenRouter vs LiteLLM?+

OpenRouter is a managed service (they hold keys, bill you, take a markup). LiteLLM is a self-hosted proxy (you hold keys, get direct-provider bills). OpenRouter for speed and flexibility; LiteLLM for control and compliance.

How much does OpenRouter add on top of provider prices?+

Typically 5-10% markup, model-dependent. Some open-source models cost less on OpenRouter than the advertised provider price due to OpenRouter’s volume agreements. Compare at openrouter.ai/models for each model’s current rate.

Does OpenRouter support tool calls / function calling?+

Yes — on models that support it (OpenAI, Claude, Gemini, many open models via their respective runtimes). The API mirrors OpenAI’s tool-calling shape.

Can I use OpenRouter with Claude Code / Cursor / Cline?+

Yes. These tools accept any OpenAI-compatible endpoint. Point them at https://openrouter.ai/api/v1 with your OpenRouter key and pick any supported model.

Is there a data retention concern?+

OpenRouter logs metadata (which model, tokens, latency) by default. Prompt/response content logging is opt-in per-request via headers. For fully zero-retention, check specific providers and enable the "OpenRouter ignore" header — or use LiteLLM with direct provider keys.

Compare Alternatives

LiteLLM — Open-source LLM Proxy for 100+ Providers Portkey — AI Gateway with Prompt Management & Observability Cloudflare AI Gateway — Edge Proxy for LLM Traffic Helicone — Zero-Code LLM Observability Platform