# OpenRouter Auto Routing — Pick the Best Model per Query > OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt. ## Install Copy the content below into your project: ## Quick Use 1. Have an OpenRouter API key 2. In any OpenAI SDK call, use `model="openrouter/auto"` 3. Optional: pass `extra_body={"models": [...], "provider": {"sort": "price"}}` to constrain --- ## Intro OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute. --- ### Use auto routing ```python from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], ) # Each call is routed independently quick = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "What is 2+2?"}], ) # → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001) complex = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}], ) # → routed to a coding model (e.g. Claude Sonnet, ~$0.05) print(quick.model) # "meta-llama/llama-3.3-70b-instruct" print(complex.model) # "anthropic/claude-3.5-sonnet" ``` The actual model used is in `response.model`. Log it with PostHog or Helicone for cost analysis. ### Constrain the auto-pool ```python extra_body = { "models": [ "anthropic/claude-3.5-sonnet", "anthropic/claude-3.5-haiku", "openai/gpt-4o-mini", ], # Auto picks the best from THIS list } response = client.chat.completions.create( model="openrouter/auto", messages=[...], extra_body=extra_body, ) ``` Useful when you have data-residency or compliance constraints — only certain providers allowed. ### Provider preferences ```python extra_body = { "models": ["openrouter/auto"], "provider": { "sort": "throughput", # "price" | "latency" | "throughput" "data_collection": "deny", "allow_fallbacks": True, }, } ``` `sort: price` → cheapest provider that meets the prompt's needs. `sort: latency` → fastest first-byte time. `sort: throughput` → highest tokens/sec for streaming. ### When NOT to use auto - You have benchmarked your prompts on one specific model — pinning is safer - Compliance requires a specific deployment region (provider-pin instead) - You need exact cost predictability (auto = variable cost) --- ### FAQ **Q: How accurate is auto routing?** A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter. **Q: Does auto routing increase latency?** A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it. **Q: Can I see what auto picked?** A: Yes — `response.model` returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call. --- ## Source & Thanks > Built by [OpenRouter](https://github.com/OpenRouterTeam). Commercial product. > > [openrouter.ai/docs](https://openrouter.ai/docs/auto-routing) — Auto Routing docs --- ## 快速使用 1. 已有 OpenRouter API key 2. 任何 OpenAI SDK 调用里用 `model="openrouter/auto"` 3. 可选:传 `extra_body={"models": [...], "provider": {"sort": "price"}}` 限制 --- ## 简介 OpenRouter Auto Routing 自动给每个 prompt 挑最佳模型 —— 分析任务后路由到成本、延迟、能力的平衡点。便宜闲聊给 Groq 上的 Llama 3.3,复杂代码给 Claude Sonnet,长上下文检索给 Gemini Pro。适合查询类型多样、一个固定模型要么太贵要么太弱的应用。兼容任何指向 OpenRouter 的 OpenAI SDK。装机时间 1 分钟。 --- ### 用 auto 路由 ```python from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], ) # 每次调用独立路由 quick = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "What is 2+2?"}], ) # → 路由到便宜快的模型(比如 Groq 上的 Llama 3.3,约 $0.0001) complex = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}], ) # → 路由到编码模型(比如 Claude Sonnet,约 $0.05) print(quick.model) # "meta-llama/llama-3.3-70b-instruct" print(complex.model) # "anthropic/claude-3.5-sonnet" ``` 实际用的模型在 `response.model`。用 PostHog 或 Helicone 记录做成本分析。 ### 限制 auto 池 ```python extra_body = { "models": [ "anthropic/claude-3.5-sonnet", "anthropic/claude-3.5-haiku", "openai/gpt-4o-mini", ], # Auto 从这个列表里挑最佳 } response = client.chat.completions.create( model="openrouter/auto", messages=[...], extra_body=extra_body, ) ``` 有数据驻留或合规约束、只允许某些 provider 时有用。 ### Provider 偏好 ```python extra_body = { "models": ["openrouter/auto"], "provider": { "sort": "throughput", # "price" | "latency" | "throughput" "data_collection": "deny", "allow_fallbacks": True, }, } ``` `sort: price` → 满足 prompt 需求的最便宜 provider。 `sort: latency` → 最快的首字节时间。 `sort: throughput` → 流式时最高 token/秒。 ### 什么时候不该用 auto - 你已经在某个具体模型上做了 prompt benchmark —— pin 更安全 - 合规要求特定部署区域(pin provider 替代) - 需要精确成本可预测性(auto = 浮动成本) --- ### FAQ **Q: Auto 路由准吗?** A: 低风险任务还行,细微差别的任务一般。路由器用启发式 + 快速分类器分析 prompt。边界 prompt(中等复杂度)可能挑偏弱的模型。重要时用 pool 限制。 **Q: Auto 路由增加延迟吗?** A: 可忽略 —— 路由决策在实际调用前加 10-50ms。最快档(闲聊用 Groq Llama)往往不止抵消这个开销。 **Q: 能看到 auto 挑了啥吗?** A: 能 —— `response.model` 返回实际用的模型。记下来分析。PostHog LLM Observability 把它作为每次调用的属性。 --- ## 来源与感谢 > Built by [OpenRouter](https://github.com/OpenRouterTeam). Commercial product. > > [openrouter.ai/docs](https://openrouter.ai/docs/auto-routing) — Auto Routing docs --- Source: https://tokrepo.com/en/workflows/openrouter-auto-routing-pick-the-best-model-per-query Author: OpenRouter