What is OpenRouter Auto Routing — Pick the Best Model per Query?

OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

Is OpenRouter Auto Routing — Pick the Best Model per Query free to use?

Yes. OpenRouter Auto Routing — Pick the Best Model per Query is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install OpenRouter Auto Routing — Pick the Best Model per Query?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

OpenRouter Auto Routing — Pick the Best Model per Query

Name: OpenRouter Auto Routing — Pick the Best Model per Query
Author: OpenRouter

简介

OpenRouter Auto Routing 自动给每个 prompt 挑最佳模型 —— 分析任务后路由到成本、延迟、能力的平衡点。便宜闲聊给 Groq 上的 Llama 3.3，复杂代码给 Claude Sonnet，长上下文检索给 Gemini Pro。适合查询类型多样、一个固定模型要么太贵要么太弱的应用。兼容任何指向 OpenRouter 的 OpenAI SDK。装机时间 1 分钟。

用 auto 路由

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# 每次调用独立路由
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → 路由到便宜快的模型（比如 Groq 上的 Llama 3.3，约 $0.0001）

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → 路由到编码模型（比如 Claude Sonnet，约 $0.05）

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

实际用的模型在 response.model。用 PostHog 或 Helicone 记录做成本分析。

限制 auto 池

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto 从这个列表里挑最佳
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

有数据驻留或合规约束、只允许某些 provider 时有用。

Provider 偏好

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → 满足 prompt 需求的最便宜 provider。 sort: latency → 最快的首字节时间。 sort: throughput → 流式时最高 token/秒。

什么时候不该用 auto

你已经在某个具体模型上做了 prompt benchmark —— pin 更安全
合规要求特定部署区域（pin provider 替代）
需要精确成本可预测性（auto = 浮动成本）

FAQ

Q: Auto 路由准吗？ A: 低风险任务还行，细微差别的任务一般。路由器用启发式 + 快速分类器分析 prompt。边界 prompt（中等复杂度）可能挑偏弱的模型。重要时用 pool 限制。

Q: Auto 路由增加延迟吗？ A: 可忽略 —— 路由决策在实际调用前加 10-50ms。最快档（闲聊用 Groq Llama）往往不止抵消这个开销。

Q: 能看到 auto 挑了啥吗？ A: 能 —— response.model 返回实际用的模型。记下来分析。PostHog LLM Observability 把它作为每次调用的属性。

OpenRouter Auto Routing — Pick the Best Model per Query

这个资产可以被 Agent 直接读取和安装

简介

用 auto 路由

限制 auto 池

Provider 偏好

什么时候不该用 auto

FAQ

来源与感谢

讨论

相关资产

OpenRouter — Unified API for 300+ LLMs with Auto Failover

Pinecone Assistant — Managed RAG Service with Auto-Indexing

LLM Gateway Comparison — Proxy Your AI Requests

ProxySQL — High-Performance MySQL Proxy with Query Routing