Is OpenRouter Auto Routing — Pick the Best Model per Query free to use?

Yes. OpenRouter Auto Routing — Pick the Best Model per Query is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install OpenRouter Auto Routing — Pick the Best Model per Query?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

WorkflowsMay 8, 2026·4 min de lectura

OpenRouter Auto Routing — Pick the Best Model per Query

Name: OpenRouter Auto Routing — Pick the Best Model per Query
Author: OpenRouter

OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

OpenRouter · Community

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: New

Entrada

Asset

Comando CLI universal

npx tokrepo install f9652b65-04fc-40e1-9fec-fa3ef2ec5921

contrato de instalación JSON de metadata plan adaptador contenido raw

Introducción

OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute.

Use auto routing

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.

Constrain the auto-pool

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

Useful when you have data-residency or compliance constraints — only certain providers allowed.

Provider preferences

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → cheapest provider that meets the prompt's needs. sort: latency → fastest first-byte time. sort: throughput → highest tokens/sec for streaming.

When NOT to use auto

You have benchmarked your prompts on one specific model — pinning is safer
Compliance requires a specific deployment region (provider-pin instead)
You need exact cost predictability (auto = variable cost)

FAQ

Q: How accurate is auto routing? A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter.

Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

Q: Can I see what auto picked? A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

Quick Use

Have an OpenRouter API key
In any OpenAI SDK call, use model="openrouter/auto"
Optional: pass extra_body={"models": [...], "provider": {"sort": "price"}} to constrain

Intro

Use auto routing

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.

Constrain the auto-pool

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

Useful when you have data-residency or compliance constraints — only certain providers allowed.

Provider preferences

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → cheapest provider that meets the prompt's needs. sort: latency → fastest first-byte time. sort: throughput → highest tokens/sec for streaming.

When NOT to use auto

You have benchmarked your prompts on one specific model — pinning is safer
Compliance requires a specific deployment region (provider-pin instead)
You need exact cost predictability (auto = variable cost)

FAQ

Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

Q: Can I see what auto picked? A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

Source & Thanks

Built by OpenRouter. Commercial product.

openrouter.ai/docs — Auto Routing docs

🙏

Fuente y agradecimientos

Built by OpenRouter. Commercial product.

openrouter.ai/docs — Auto Routing docs

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

OpenRouter — Unified API for 300+ LLMs with Auto Failover

OpenRouter is one OpenAI-compatible endpoint for 300+ LLMs across 60+ providers. Transparent pricing, no markup, automatic failover when a route is down.

Workflows

OpenRouter

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline.

Workflows

Pinecone

LLM Gateway Comparison — Proxy Your AI Requests

Compare top LLM gateway and proxy tools for routing AI requests. Covers LiteLLM, Bifrost, Portkey, and OpenRouter for cost optimization, failover, and multi-provider access.

Workflows

Agent Toolkit

ProxySQL — High-Performance MySQL Proxy with Query Routing

ProxySQL is a high-performance proxy for MySQL and its forks that provides connection pooling, read/write splitting, query caching, and real-time reconfiguration without restarts.

Configs

AI Open Source