How do I install LiteLLM Router — Smart Failover & Load Balancing in Python?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

LiteLLM Router — Smart Failover & Load Balancing in Python

from litellm import Router router = Router(model_list=[ # Primary { "model_name": "claude-fast", "litellm_params": { "model": "anthropic/claude-3-5-haiku-20241022", "api_key": os.environ["ANTHROPIC_API_KEY"], "rpm": 1000, # requests per minute }, }, # Fallback { "model_name": "claude-fast", # same name = same pool "litellm_params": { "model": "openai/gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"], "rpm": 5000, }, }, ]) # Router picks one based on load + health resp = router.completion( model="claude-fast", messages=[{"role": "user", "content": "Hello"}], )

router = Router(model_list=[ { "model_name": "experimental", "litellm_params": {"model": "openai/gpt-4o"}, "model_info": {"weight": 0.1}, # 10% traffic }, { "model_name": "experimental", "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"}, "model_info": {"weight": 0.9}, # 90% traffic }, ])

Quick Use

pip install litellm
Set ANTHROPIC_API_KEY and OPENAI_API_KEY env vars
Drop the Router snippet below into your Python app

Intro

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).

Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.

Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

LiteLLM Router — Smart Failover & Load Balancing in Python

This asset can be read and installed directly by agents

Hello world

Latency-based routing

A/B testing

Async support

FAQ

Quick Use

Intro

Hello world

Latency-based routing

A/B testing

Async support

FAQ

Source & Thanks

Source & Thanks

Discussion

Related Assets

LiteLLM — Unified Proxy for 100+ LLM APIs

Manifest — Smart LLM Router That Cuts Costs 70%

pfSense — Open-Source Firewall and Router Platform

Chi — Lightweight Composable HTTP Router for Go