Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 7, 2026·4 min de lectura

LiteLLM Router — Smart Failover & Load Balancing in Python

LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 29/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: Community
Entrada
Asset
Comando de staging seguro
npx -y tokrepo@latest install fd0004e9-6d6b-4e72-b6b1-643c80dad027 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).


Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.


Quick Use

  1. pip install litellm
  2. Set ANTHROPIC_API_KEY and OPENAI_API_KEY env vars
  3. Drop the Router snippet below into your Python app

Intro

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).


Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.


Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

🙏

Fuente y agradecimientos

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados