Is OpenRouter Auto Routing — Pick the Best Model per Query free to use?

Yes. OpenRouter Auto Routing — Pick the Best Model per Query is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install OpenRouter Auto Routing — Pick the Best Model per Query?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

WorkflowsMay 8, 2026·4 min de lecture

OpenRouter Auto Routing — Pick the Best Model per Query

Name: OpenRouter Auto Routing — Pick the Best Model per Query
Author: OpenRouter

OpenRouter Auto routes each query to the optimal model balancing cost, latency, capability. Set model=openrouter/auto, the router decides per-prompt.

OpenRouter · Community

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Stage only

Confiance

Confiance : New

Point d'entrée

Asset

Commande CLI universelle

npx tokrepo install f9652b65-04fc-40e1-9fec-fa3ef2ec5921

contrat d'installation JSON metadata plan adaptateur contenu raw

Introduction

OpenRouter Auto Routing picks the best model for each prompt automatically — analyzing the task, then routing to a balance of cost, latency, and capability. Cheap chitchat goes to Llama 3.3 on Groq; complex code goes to Claude Sonnet; long-context retrieval goes to Gemini Pro. Best for: apps with diverse query types where one fixed model is either too expensive or too weak. Works with: any OpenAI SDK pointing at OpenRouter. Setup time: 1 minute.

Use auto routing

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.

Constrain the auto-pool

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

Useful when you have data-residency or compliance constraints — only certain providers allowed.

Provider preferences

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → cheapest provider that meets the prompt's needs. sort: latency → fastest first-byte time. sort: throughput → highest tokens/sec for streaming.

When NOT to use auto

You have benchmarked your prompts on one specific model — pinning is safer
Compliance requires a specific deployment region (provider-pin instead)
You need exact cost predictability (auto = variable cost)

FAQ

Q: How accurate is auto routing? A: Good for low-stakes tasks, mediocre for nuanced ones. The router uses heuristics + a fast classifier on the prompt. For prompts at the boundary (medium complexity) it can pick a model that's slightly under-spec'd. Constrain the pool when stakes matter.

Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

Q: Can I see what auto picked? A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

Quick Use

Have an OpenRouter API key
In any OpenAI SDK call, use model="openrouter/auto"
Optional: pass extra_body={"models": [...], "provider": {"sort": "price"}} to constrain

Intro

Use auto routing

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

# Each call is routed independently
quick = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
# → routed to a cheap, fast model (e.g. Llama 3.3 on Groq, ~$0.0001)

complex = client.chat.completions.create(
    model="openrouter/auto",
    messages=[{"role": "user", "content": "Refactor this 500-line Python file..."}],
)
# → routed to a coding model (e.g. Claude Sonnet, ~$0.05)

print(quick.model)    # "meta-llama/llama-3.3-70b-instruct"
print(complex.model)  # "anthropic/claude-3.5-sonnet"

The actual model used is in response.model. Log it with PostHog or Helicone for cost analysis.

Constrain the auto-pool

extra_body = {
    "models": [
        "anthropic/claude-3.5-sonnet",
        "anthropic/claude-3.5-haiku",
        "openai/gpt-4o-mini",
    ],
    # Auto picks the best from THIS list
}

response = client.chat.completions.create(
    model="openrouter/auto",
    messages=[...],
    extra_body=extra_body,
)

Useful when you have data-residency or compliance constraints — only certain providers allowed.

Provider preferences

extra_body = {
    "models": ["openrouter/auto"],
    "provider": {
        "sort": "throughput",        # "price" | "latency" | "throughput"
        "data_collection": "deny",
        "allow_fallbacks": True,
    },
}

sort: price → cheapest provider that meets the prompt's needs. sort: latency → fastest first-byte time. sort: throughput → highest tokens/sec for streaming.

When NOT to use auto

You have benchmarked your prompts on one specific model — pinning is safer
Compliance requires a specific deployment region (provider-pin instead)
You need exact cost predictability (auto = variable cost)

FAQ

Q: Does auto routing increase latency? A: Negligibly — the routing decision adds ~10-50ms before the actual call. The fastest tier (Groq Llama for chitchat) often more than makes up for it.

Q: Can I see what auto picked? A: Yes — response.model returns the actual model used. Log this for analysis. PostHog LLM Observability shows it as a property on each call.

Source & Thanks

Built by OpenRouter. Commercial product.

openrouter.ai/docs — Auto Routing docs

🙏

Source et remerciements

Built by OpenRouter. Commercial product.

openrouter.ai/docs — Auto Routing docs

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

OpenRouter — Unified API for 300+ LLMs with Auto Failover

OpenRouter is one OpenAI-compatible endpoint for 300+ LLMs across 60+ providers. Transparent pricing, no markup, automatic failover when a route is down.

Workflows

OpenRouter

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline.

Workflows

Pinecone

LLM Gateway Comparison — Proxy Your AI Requests

Compare top LLM gateway and proxy tools for routing AI requests. Covers LiteLLM, Bifrost, Portkey, and OpenRouter for cost optimization, failover, and multi-provider access.

Workflows

Agent Toolkit

ProxySQL — High-Performance MySQL Proxy with Query Routing

ProxySQL is a high-performance proxy for MySQL and its forks that provides connection pooling, read/write splitting, query caching, and real-time reconfiguration without restarts.

Configs

AI Open Source