Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 7, 2026·4 min de lectura

LiteLLM Router — Smart Failover & Load Balancing in Python

LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install fd0004e9-6d6b-4e72-b6b1-643c80dad027
Introducción

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).


Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.


Quick Use

  1. pip install litellm
  2. Set ANTHROPIC_API_KEY and OPENAI_API_KEY env vars
  3. Drop the Router snippet below into your Python app

Intro

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).


Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.


Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

🙏

Fuente y agradecimientos

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados