How do I install LiteLLM Router — Smart Failover & Load Balancing in Python?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

ScriptsMay 7, 2026·4 min de lectura

LiteLLM Router — Smart Failover & Load Balancing in Python

Name: LiteLLM Router — Smart Failover & Load Balancing in Python
Author: LiteLLM (BerriAI)

LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

LiteLLM (BerriAI) · Community

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Stage only

Confianza

Confianza: New

Entrada

Asset

Comando CLI universal

npx tokrepo install fd0004e9-6d6b-4e72-b6b1-643c80dad027

contrato de instalación JSON de metadata plan adaptador contenido raw

Introducción

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).

Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.

Quick Use

pip install litellm
Set ANTHROPIC_API_KEY and OPENAI_API_KEY env vars
Drop the Router snippet below into your Python app

Intro

Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

🙏

Fuente y agradecimientos

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

LiteLLM — Unified Proxy for 100+ LLM APIs

Python SDK and proxy server to call 100+ LLM APIs in OpenAI format. Cost tracking, guardrails, load balancing, logging. Supports Bedrock, Azure, Anthropic, Vertex, and more. 42K+ stars.

Scripts

Script Depot

Manifest — Smart LLM Router That Cuts Costs 70%

Intelligent LLM routing that scores requests across 23 dimensions in under 2ms. Routes to the cheapest capable model among 300+ options from 13+ providers. MIT, 4,200+ stars.

Scripts

AI Open Source

Tasmota — Open-Source Firmware for ESP-Based Smart Devices

Flash Tasmota onto ESP8266/ESP32 devices for local-only smart home control with MQTT, timers, and rules — no cloud required.

Scripts

Script Depot

pfSense — Open-Source Firewall and Router Platform

A FreeBSD-based open-source firewall and router platform with a web management interface for securing and routing network traffic.

Scripts

Script Depot