What is OpenRouter Auto Routing — Pick the Best Model per Query?

OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

How do I install OpenRouter Auto Routing — Pick the Best Model per Query?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

OpenRouter Auto Routing — Pick the Best Model per Query

Name: OpenRouter Auto Routing — Pick the Best Model per Query
Author: OpenRouter

from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], ) # Each call is routed independently quick = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "What is 2+2?"}], ) # → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001) complex = client.chat.completions.create( model="openrouter/auto", messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}], ) # → routed to a coding model (e.g. Claude Sonnet, ~$0.05) print(quick.model) # "meta-llama/llama-3.3-70b-instruct" print(complex.model) # "anthropic/claude-3.5-sonnet"

extra_body = { "models": [ "anthropic/claude-3.5-sonnet", "anthropic/claude-3.5-haiku", "openai/gpt-4o-mini", ], # Auto picks the best from THIS list } response = client.chat.completions.create( model="openrouter/auto", messages=[...], extra_body=extra_body, )

extra_body = { "models": ["openrouter/auto"], "provider": { "sort": "throughput", # "price" | "latency" | "throughput" "data_collection": "deny", "allow_fallbacks": True, }, }

Quick Use

Have an OpenRouter API key
In any OpenAI SDK call, use model="openrouter/auto"
Optional: pass extra_body={"models": [...], "provider": {"sort": "price"}} to constrain

Intro

OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute.

Use auto routing

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.

Constrain the auto-pool

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

Useful when you have data-residency or compliance constraints — only certain providers allowed.

Provider preferences

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → cheapest provider that meets the prompt's needs. sort: latency → fastest first-byte time. sort: throughput → highest tokens/sec for streaming.

When NOT to use auto

You have benchmarked your prompts on one specific model — pinning is safer
Compliance requires a specific deployment region (provider-pin instead)
You need exact cost predictability (auto = variable cost)

FAQ

Q: How accurate is auto routing? A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter.

Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

Q: Can I see what auto picked? A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

Source & Thanks

Built by OpenRouter. Commercial product.

openrouter.ai/docs — Auto Routing docs

OpenRouter Auto Routing — Pick the Best Model per Query

This asset can be read and installed directly by agents

Use auto routing

Constrain the auto-pool

Provider preferences

When NOT to use auto

FAQ

Quick Use

Intro

Use auto routing

Constrain the auto-pool

Provider preferences

When NOT to use auto

FAQ

Source & Thanks

Source & Thanks

Discussion

Related Assets

OpenRouter — Unified API for 300+ LLMs with Auto Failover

Pinecone Assistant — Managed RAG Service with Auto-Indexing

LLM Gateway Comparison — Proxy Your AI Requests

ProxySQL — High-Performance MySQL Proxy with Query Routing