# LiteLLM Router — Smart Failover & Load Balancing in Python > LiteLLM Router routes LLM endpoints with retry, fallback, latency-based, weighted A/B. Pure Python — drop into any codebase, no separate proxy needed. ## Install Save as a script file and run: ## Quick Use 1. `pip install litellm` 2. Set `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars 3. Drop the Router snippet below into your Python app --- ## Intro LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call `.completion()`. Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (`pip install litellm` + 20 lines). --- ### Hello world ```python from litellm import Router router = Router(model_list=[ # Primary { "model_name": "claude-fast", "litellm_params": { "model": "anthropic/claude-3-5-haiku-20241022", "api_key": os.environ["ANTHROPIC_API_KEY"], "rpm": 1000, # requests per minute }, }, # Fallback { "model_name": "claude-fast", # same name = same pool "litellm_params": { "model": "openai/gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"], "rpm": 5000, }, }, ]) # Router picks one based on load + health resp = router.completion( model="claude-fast", messages=[{"role": "user", "content": "Hello"}], ) ``` If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way. ### Latency-based routing ```python router = Router( model_list=[...], routing_strategy="latency-based-routing", routing_strategy_args={"ttl": 25}, # re-evaluate every 25s ) ``` The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters. ### A/B testing ```python router = Router(model_list=[ { "model_name": "experimental", "litellm_params": {"model": "openai/gpt-4o"}, "model_info": {"weight": 0.1}, # 10% traffic }, { "model_name": "experimental", "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"}, "model_info": {"weight": 0.9}, # 90% traffic }, ]) ``` Track quality metrics on responses by `model_used` (returned in the response) to decide which to graduate. ### Async support ```python import asyncio async def main(): resp = await router.acompletion( model="claude-fast", messages=[{"role": "user", "content": "Hello"}], ) asyncio.run(main()) ``` --- ### FAQ **Q: Router vs Proxy — which should I use?** A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local. **Q: Does Router track costs?** A: Yes — the response includes `_response_ms` and cost in `_hidden_params`. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line. **Q: Can I add custom routing logic?** A: Yes — `routing_strategy='custom'` and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'. --- ## Source & Thanks > Built by [BerriAI](https://github.com/BerriAI). Licensed under MIT. > > [BerriAI/litellm](https://github.com/BerriAI/litellm) — ⭐ 17,000+ --- ## 快速使用 1. `pip install litellm` 2. 设 `ANTHROPIC_API_KEY` 和 `OPENAI_API_KEY` 环境变量 3. 把下面的 Router 代码片段加进你的 Python 应用 --- ## 简介 LiteLLM Router 是 LiteLLM Proxy 的 Python 原生版 —— 同样的路由逻辑(自动回退、负载均衡、A/B、按延迟路由),不用跑 proxy server。import Router 类、定义模型列表、调 `.completion()` 就行。适合 Python 应用想要 LiteLLM 韧性又不想跑独立 Docker proxy。需要 Python ≥3.8,同步和异步都支持。装机时间 2 分钟(`pip install litellm` + 20 行)。 --- ### Hello world ```python from litellm import Router router = Router(model_list=[ # 主路径 { "model_name": "claude-fast", "litellm_params": { "model": "anthropic/claude-3-5-haiku-20241022", "api_key": os.environ["ANTHROPIC_API_KEY"], "rpm": 1000, }, }, # 回退 { "model_name": "claude-fast", # 同名 = 同一池子 "litellm_params": { "model": "openai/gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"], "rpm": 5000, }, }, ]) # Router 按负载 + 健康度挑一个 resp = router.completion( model="claude-fast", messages=[{"role": "user", "content": "Hello"}], ) ``` Anthropic 挂了或限速,router 静默回退到 GPT-4o-mini。调用方两种情况都拿到成功响应。 ### 按延迟路由 ```python router = Router( model_list=[...], routing_strategy="latency-based-routing", routing_strategy_args={"ttl": 25}, # 每 25 秒重评估 ) ``` Router 每 TTL 秒 ping 一次每个端点,把新请求路由到最快的。p99 敏感的用户面应用关键。 ### A/B 测试 ```python router = Router(model_list=[ { "model_name": "experimental", "litellm_params": {"model": "openai/gpt-4o"}, "model_info": {"weight": 0.1}, # 10% 流量 }, { "model_name": "experimental", "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"}, "model_info": {"weight": 0.9}, # 90% 流量 }, ]) ``` 按响应里的 `model_used` 跟踪质量指标,决定哪个胜出。 ### 异步 ```python import asyncio async def main(): resp = await router.acompletion( model="claude-fast", messages=[{"role": "user", "content": "Hello"}], ) asyncio.run(main()) ``` --- ### FAQ **Q: Router 和 Proxy 选哪个?** A: Router 适合单 Python 应用(不用多一个容器)。Proxy 适合多团队 / 多语言(任何说 OpenAI 格式的客户端都能用)。路由逻辑一样,部署模型不同。很多团队生产用 Router、本地开发用 Proxy。 **Q: Router 跟踪成本吗?** A: 跟踪。响应里带 `_response_ms` 和 `_hidden_params` 里的 cost。要持久化跟踪就把 Router 接到回调(Langfuse / Helicone / OTEL),一行配置。 **Q: 能加自定义路由逻辑吗?** A: 能 —— `routing_strategy='custom'` 然后传一个 callable。适合「带 PII 脱敏的请求永远走 Claude」或「按用户等级路由」这种规则。 --- ## 来源与感谢 > Built by [BerriAI](https://github.com/BerriAI). Licensed under MIT. > > [BerriAI/litellm](https://github.com/BerriAI/litellm) — ⭐ 17,000+ --- Source: https://tokrepo.com/en/workflows/litellm-router-smart-failover-load-balancing-in-python Author: LiteLLM (BerriAI)