# LiteLLM Router — Smart Failover & Load Balancing in Python

> LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed.

## Install

Save as a script file and run:

## Quick Use

1. `pip install litellm`
2. Set `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars
3. Drop the Router snippet below into your Python app

---

## Intro

LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call `.completion()`. Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (`pip install litellm` + 20 lines).

---

### Hello world

```python
from litellm import Router

router = Router(model_list=[
    # Primary
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,  # requests per minute
        },
    },
    # Fallback
    {
        "model_name": "claude-fast",  # same name = same pool
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router picks one based on load + health
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)
```

If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.

### Latency-based routing

```python
router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # re-evaluate every 25s
)
```

The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.

### A/B testing

```python
router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% traffic
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% traffic
    },
])
```

Track quality metrics on responses by `model_used` (returned in the response) to decide which to graduate.

### Async support

```python
import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())
```

---

### FAQ

**Q: Router vs Proxy — which should I use?**
A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.

**Q: Does Router track costs?**
A: Yes — the response includes `_response_ms` and cost in `_hidden_params`. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.

**Q: Can I add custom routing logic?**
A: Yes — `routing_strategy='custom'` and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.

---

## Source & Thanks

> Built by [BerriAI](https://github.com/BerriAI). Licensed under MIT.
>
> [BerriAI/litellm](https://github.com/BerriAI/litellm) — ⭐ 17,000+

---

<!-- ZH -->

## 快速使用

1. `pip install litellm`
2. 设 `ANTHROPIC_API_KEY` 和 `OPENAI_API_KEY` 环境变量
3. 把下面的 Router 代码片段加进你的 Python 应用

---

## 简介

LiteLLM Router 是 LiteLLM Proxy 的 Python 原生版 —— 同样的路由逻辑（自动回退、负载均衡、A/B、按延迟路由），不用跑 proxy server。import Router 类、定义模型列表、调 `.completion()` 就行。适合 Python 应用想要 LiteLLM 韧性又不想跑独立 Docker proxy。需要 Python ≥3.8，同步和异步都支持。装机时间 2 分钟（`pip install litellm` + 20 行）。

---

### Hello world

```python
from litellm import Router

router = Router(model_list=[
    # 主路径
    {
        "model_name": "claude-fast",
        "litellm_params": {
            "model": "anthropic/claude-3-5-haiku-20241022",
            "api_key": os.environ["ANTHROPIC_API_KEY"],
            "rpm": 1000,
        },
    },
    # 回退
    {
        "model_name": "claude-fast",  # 同名 = 同一池子
        "litellm_params": {
            "model": "openai/gpt-4o-mini",
            "api_key": os.environ["OPENAI_API_KEY"],
            "rpm": 5000,
        },
    },
])

# Router 按负载 + 健康度挑一个
resp = router.completion(
    model="claude-fast",
    messages=[{"role": "user", "content": "Hello"}],
)
```

Anthropic 挂了或限速，router 静默回退到 GPT-4o-mini。调用方两种情况都拿到成功响应。

### 按延迟路由

```python
router = Router(
    model_list=[...],
    routing_strategy="latency-based-routing",
    routing_strategy_args={"ttl": 25},  # 每 25 秒重评估
)
```

Router 每 TTL 秒 ping 一次每个端点，把新请求路由到最快的。p99 敏感的用户面应用关键。

### A/B 测试

```python
router = Router(model_list=[
    {
        "model_name": "experimental",
        "litellm_params": {"model": "openai/gpt-4o"},
        "model_info": {"weight": 0.1},  # 10% 流量
    },
    {
        "model_name": "experimental",
        "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
        "model_info": {"weight": 0.9},  # 90% 流量
    },
])
```

按响应里的 `model_used` 跟踪质量指标，决定哪个胜出。

### 异步

```python
import asyncio

async def main():
    resp = await router.acompletion(
        model="claude-fast",
        messages=[{"role": "user", "content": "Hello"}],
    )

asyncio.run(main())
```

---

### FAQ

**Q: Router 和 Proxy 选哪个？**
A: Router 适合单 Python 应用（不用多一个容器）。Proxy 适合多团队 / 多语言（任何说 OpenAI 格式的客户端都能用）。路由逻辑一样，部署模型不同。很多团队生产用 Router、本地开发用 Proxy。

**Q: Router 跟踪成本吗？**
A: 跟踪。响应里带 `_response_ms` 和 `_hidden_params` 里的 cost。要持久化跟踪就把 Router 接到回调（Langfuse / Helicone / OTEL），一行配置。

**Q: 能加自定义路由逻辑吗？**
A: 能 —— `routing_strategy='custom'` 然后传一个 callable。适合「带 PII 脱敏的请求永远走 Claude」或「按用户等级路由」这种规则。

---

## 来源与感谢

> Built by [BerriAI](https://github.com/BerriAI). Licensed under MIT.
>
> [BerriAI/litellm](https://github.com/BerriAI/litellm) — ⭐ 17,000+


---
Source: https://tokrepo.com/en/workflows/litellm-router-smart-failover-load-balancing-in-python
Author: LiteLLM (BerriAI)