How do I install LiteLLM Router — Smart Failover & Load Balancing in Python?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Cette page est affichée en anglais. Une traduction française est en cours.

ScriptsMay 7, 2026·4 min de lecture

LiteLLM Router — Smart Failover & Load Balancing in Python

Name: LiteLLM Router — Smart Failover & Load Balancing in Python
Author: LiteLLM (BerriAI)

LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

LiteLLM (BerriAI) · Community

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Stage only

Confiance

Confiance : New

Point d'entrée

Asset

Commande CLI universelle

npx tokrepo install fd0004e9-6d6b-4e72-b6b1-643c80dad027

contrat d'installation JSON metadata plan adaptateur contenu raw

Introduction

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).

Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

Q: Does Router track costs? A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

Q: Can I add custom routing logic? A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.

Quick Use

pip install litellm
Set ANTHROPIC_API_KEY and OPENAI_API_KEY env vars
Drop the Router snippet below into your Python app

Intro

Hello world

from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

Latency-based routing

router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

A/B testing

router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])

Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.

Async support

import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())

FAQ

Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

🙏

Source et remerciements

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

LiteLLM — Unified Proxy for 100+ LLM APIs

Python SDK and proxy server to call 100+ LLM APIs in OpenAI format. Cost tracking, guardrails, load balancing, logging. Supports Bedrock, Azure, Anthropic, Vertex, and more. 42K+ stars.

Scripts

Script Depot

Manifest — Smart LLM Router That Cuts Costs 70%

Intelligent LLM routing that scores requests across 23 dimensions in under 2ms. Routes to the cheapest capable model among 300+ options from 13+ providers. MIT, 4,200+ stars.

Scripts

AI Open Source

Tasmota — Open-Source Firmware for ESP-Based Smart Devices

Flash Tasmota onto ESP8266/ESP32 devices for local-only smart home control with MQTT, timers, and rules — no cloud required.

Scripts

Script Depot

pfSense — Open-Source Firewall and Router Platform

A FreeBSD-based open-source firewall and router platform with a web management interface for securing and routing network traffic.

Scripts

Script Depot