Quick Use
- Have an OpenRouter API key
- In any OpenAI SDK call, use
model="openrouter/auto" - Optional: pass
extra_body={"models": [...], "provider": {"sort": "price"}}to constrain
Intro
OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute.
Use auto routing
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
# Each call is routed independently
quick = client.chat.completions.create(
model="openrouter/auto",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)
complex = client.chat.completions.create(
model="openrouter/auto",
messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)
print(quick.model) # "meta-llama/llama-3.3-70b-instruct"
print(complex.model) # "anthropic/claude-3.5-sonnet"The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.
Constrain the auto-pool
extra_body = {
"models": [
"anthropic/claude-3.5-sonnet",
"anthropic/claude-3.5-haiku",
"openai/gpt-4o-mini",
],
# Auto picks the best from THIS list
}
response = client.chat.completions.create(
model="openrouter/auto",
messages=[...],
extra_body=extra_body,
)Useful when you have data-residency or compliance constraints — only certain providers allowed.
Provider preferences
extra_body = {
"models": ["openrouter/auto"],
"provider": {
"sort": "throughput", # "price" | "latency" | "throughput"
"data_collection": "deny",
"allow_fallbacks": True,
},
}sort: price → cheapest provider that meets the prompt's needs.
sort: latency → fastest first-byte time.
sort: throughput → highest tokens/sec for streaming.
When NOT to use auto
- You have benchmarked your prompts on one specific model — pinning is safer
- Compliance requires a specific deployment region (provider-pin instead)
- You need exact cost predictability (auto = variable cost)
FAQ
Q: How accurate is auto routing? A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter.
Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.
Q: Can I see what auto picked?
A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.
Source & Thanks
Built by OpenRouter. Commercial product.
openrouter.ai/docs — Auto Routing docs