# OpenRouter Auto Routing — Pick the Best Model per Query

> OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

## Install

Copy the content below into your project:

## Quick Use

1. Have an OpenRouter API key
2. In any OpenAI SDK call, use `model="openrouter/auto"`
3. Optional: pass `extra_body={"models": [...], "provider": {"sort": "price"}}` to constrain

---

## Intro

OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute.

---

### Use auto routing

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"
```

The actual model used is in `response.model`. Log it with PostHog or Helicone for cost analysis.

### Constrain the auto-pool

```python
extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)
```

Useful when you have data-residency or compliance constraints — only certain providers allowed.

### Provider preferences

```python
extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}
```

`sort: price` → cheapest provider that meets the prompt's needs.
`sort: latency` → fastest first-byte time.
`sort: throughput` → highest tokens/sec for streaming.

### When NOT to use auto

- You have benchmarked your prompts on one specific model — pinning is safer
- Compliance requires a specific deployment region (provider-pin instead)
- You need exact cost predictability (auto = variable cost)

---

### FAQ

**Q: How accurate is auto routing?**
A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter.

**Q: Does auto routing increase latency?**
A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

**Q: Can I see what auto picked?**
A: Yes — `response.model` returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

---

## Source & Thanks

> Built by [OpenRouter](https://github.com/OpenRouterTeam). Commercial product.
>
> [openrouter.ai/docs](https://openrouter.ai/docs/auto-routing) — Auto Routing docs

---

<!-- ZH -->

## 快速使用

1. 已有 OpenRouter API key
2. 任何 OpenAI SDK 调用里用 `model="openrouter/auto"`
3. 可选：传 `extra_body={"models": [...], "provider": {"sort": "price"}}` 限制

---

## 简介

OpenRouter Auto Routing 自动给每个 prompt 挑最佳模型 —— 分析任务后路由到成本、延迟、能力的平衡点。便宜闲聊给 Groq 上的 Llama 3.3，复杂代码给 Claude Sonnet，长上下文检索给 Gemini Pro。适合查询类型多样、一个固定模型要么太贵要么太弱的应用。兼容任何指向 OpenRouter 的 OpenAI SDK。装机时间 1 分钟。

---

### 用 auto 路由

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# 每次调用独立路由
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → 路由到便宜快的模型（比如 Groq 上的 Llama 3.3，约 $0.0001）

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → 路由到编码模型（比如 Claude Sonnet，约 $0.05）

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"
```

实际用的模型在 `response.model`。用 PostHog 或 Helicone 记录做成本分析。

### 限制 auto 池

```python
extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto 从这个列表里挑最佳
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)
```

有数据驻留或合规约束、只允许某些 provider 时有用。

### Provider 偏好

```python
extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}
```

`sort: price` → 满足 prompt 需求的最便宜 provider。
`sort: latency` → 最快的首字节时间。
`sort: throughput` → 流式时最高 token/秒。

### 什么时候不该用 auto

- 你已经在某个具体模型上做了 prompt benchmark —— pin 更安全
- 合规要求特定部署区域（pin provider 替代）
- 需要精确成本可预测性（auto = 浮动成本）

---

### FAQ

**Q: Auto 路由准吗？**
A: 低风险任务还行，细微差别的任务一般。路由器用启发式 + 快速分类器分析 prompt。边界 prompt（中等复杂度）可能挑偏弱的模型。重要时用 pool 限制。

**Q: Auto 路由增加延迟吗？**
A: 可忽略 —— 路由决策在实际调用前加 10-50ms。最快档（闲聊用 Groq Llama）往往不止抵消这个开销。

**Q: 能看到 auto 挑了啥吗？**
A: 能 —— `response.model` 返回实际用的模型。记下来分析。PostHog LLM Observability 把它作为每次调用的属性。

---

## 来源与感谢

> Built by [OpenRouter](https://github.com/OpenRouterTeam). Commercial product.
>
> [openrouter.ai/docs](https://openrouter.ai/docs/auto-routing) — Auto Routing docs


---
Source: https://tokrepo.com/en/workflows/openrouter-auto-routing-pick-the-best-model-per-query
Author: OpenRouter