AI Gateway

OpenRouter — 300+ 模型的统一 API，一张账单

OpenRouter 是托管路由服务，把 300+ 款 LLM（OpenAI、Claude、Gemini、Groq/Together/Fireworks 上的开源模型）统一到一个 OpenAI 兼容 API，一张账单结算。

Why OpenRouter

OpenRouter solves a specific pain: "I want to try 10 models this week without signing 10 vendor contracts". Top up credit once, access every major closed and open model behind one API key, switch models by changing a string. The pricing is pay-per-token — a small markup over direct provider prices in exchange for zero setup and unified billing.

It’s the fastest way to benchmark models on your workload. Prompt caching, streaming, tool calls, and vision all work uniformly. You can A/B test Claude 3.5 Sonnet against Gemini 2.0 Pro against Llama 3.3 70B in an afternoon.

Where it’s not the right answer: when you need direct vendor relationships (enterprise contracts, zero-retention SLAs, regional data residency) or when the per-token markup matters at your volume. At 10M+ tokens per month, going direct with LiteLLM proxying your own keys is often cheaper and gives you contractual leverage.

Quick Start — OpenAI SDK + Model String

HTTP-Referer and X-Title are optional but recommended — they make your app show up on the OpenRouter leaderboard (useful for attribution). Model names follow provider/model-slug. The "openrouter/auto" model leaves routing to OpenRouter’s cost optimizer.

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-or-...",
    base_url="https://openrouter.ai/api/v1",
    default_headers={
        "HTTP-Referer": "https://tokrepo.com",
        "X-Title": "TokRepo AI Gateway Example",
    },
)

# Switch models by changing the string — same code path
for model in [
    "anthropic/claude-3.5-sonnet",
    "openai/gpt-4o-mini",
    "google/gemini-2.0-flash-001",
    "meta-llama/llama-3.3-70b-instruct",
]:
    resp = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Name one thing you do best."}],
        max_tokens=80,
    )
    print(f"{model}: {resp.choices[0].message.content.strip()}")
    print(f"  cost: $ {resp.usage.total_tokens} tokens")

# Advanced: let OpenRouter pick cheapest available provider
# model="openrouter/auto"  → auto-routes based on cost + availability.
# Or use OR-specific params for provider preferences and fallbacks.

核心能力

300+ models behind one API

Closed models (OpenAI, Anthropic, Google, Cohere), major open models (Llama, Mistral, Qwen, DeepSeek), and specialty models (Perplexity online, vision models). One API key for everything.

Automatic provider fallback

OpenRouter keeps multiple upstream providers per open model (Groq, Together, Fireworks, Anyscale). If one is down or slow, it retries with another transparently.

Pay-per-token, no minimums

Top up credit, pay only for what you use. No monthly fees, no per-provider subscriptions. Cost visible per-request in response headers.

Provider preferences

Request-time headers to prefer specific providers, regions, or pricing tiers. Useful for compliance ("EU providers only") or performance ("prefer Groq").

Free tier models

A rotating set of free models (e.g., some smaller Llama and Gemma variants) for experimentation. Rate-limited but useful for prototyping.

App attribution

Apps can register with OpenRouter for leaderboards and default routing rules. Good distribution channel for public AI tools.

对比

	Type	Model Count	Billing	Self-host?
OpenRouter本工具	Managed router	300+	Unified (topup + per-token)	No
LiteLLM	Self-host proxy + SDK	100+ providers	BYO keys per provider	Yes
Together AI	Hosted open-source inference	~50 OSS models	Per-token	No
Groq	Specialty fast inference	~20 OSS models	Per-token	No

实际用例

01. Model benchmarking

Run your actual prompts against a dozen models in an afternoon. Compare quality and cost before committing to a primary provider.

02. Fast prototyping

Side projects, weekend hacks, demos — one topup, every model available. Avoids the "I only want $5 of Claude" friction of direct vendor signup.

03. Apps that let users pick a model

Chatbots and AI wrappers that expose model choice to end users. OpenRouter is the cleanest way to offer 10+ options without 10+ integrations.

价格与许可

Per-token pricing: direct upstream cost plus a small (typically 5-10%) markup. Exact rates per model shown at openrouter.ai/models. No monthly fees.

Free tier: limited free models (rate-limited, rotating list) for experimentation. Useful for dev/testing without spend.

At scale, compare against direct: for single-model high-volume workloads, direct provider relationships often beat OpenRouter’s markup. OpenRouter wins on flexibility and multi-model cost; direct wins on volume discounts and compliance.

常见问题

OpenRouter vs LiteLLM?+

OpenRouter is a managed service (they hold keys, bill you, take a markup). LiteLLM is a self-hosted proxy (you hold keys, get direct-provider bills). OpenRouter for speed and flexibility; LiteLLM for control and compliance.

How much does OpenRouter add on top of provider prices?+

Typically 5-10% markup, model-dependent. Some open-source models cost less on OpenRouter than the advertised provider price due to OpenRouter’s volume agreements. Compare at openrouter.ai/models for each model’s current rate.

Does OpenRouter support tool calls / function calling?+

Yes — on models that support it (OpenAI, Claude, Gemini, many open models via their respective runtimes). The API mirrors OpenAI’s tool-calling shape.

Can I use OpenRouter with Claude Code / Cursor / Cline?+

Yes. These tools accept any OpenAI-compatible endpoint. Point them at https://openrouter.ai/api/v1 with your OpenRouter key and pick any supported model.

Is there a data retention concern?+

OpenRouter logs metadata (which model, tokens, latency) by default. Prompt/response content logging is opt-in per-request via headers. For fully zero-retention, check specific providers and enable the "OpenRouter ignore" header — or use LiteLLM with direct provider keys.