Quick Use
pip install litellm- Set
ANTHROPIC_API_KEYandOPENAI_API_KEYenv vars - Drop the Router snippet below into your Python app
Intro
LiteLLM Router is the Python-native version of the LiteLLM Proxy — same routing logic (failover, load balance, A/B, latency-aware), no proxy server required. Import the Router class, define your model list, call .completion(). Best for: Python apps where you want LiteLLM's resilience without running a separate Docker proxy. Works with: any Python ≥3.8 project, async + sync. Setup time: 2 minutes (pip install litellm + 20 lines).
Hello world
from litellm import Router
router = Router(model_list=[
# Primary
{
"model_name": "claude-fast",
"litellm_params": {
"model": "anthropic/claude-3-5-haiku-20241022",
"api_key": os.environ["ANTHROPIC_API_KEY"],
"rpm": 1000, # requests per minute
},
},
# Fallback
{
"model_name": "claude-fast", # same name = same pool
"litellm_params": {
"model": "openai/gpt-4o-mini",
"api_key": os.environ["OPENAI_API_KEY"],
"rpm": 5000,
},
},
])
# Router picks one based on load + health
resp = router.completion(
model="claude-fast",
messages=[{"role": "user", "content": "Hello"}],
)If Anthropic is down or rate-limited, the router silently falls back to GPT-4o-mini. The caller gets a successful response either way.
Latency-based routing
router = Router(
model_list=[...],
routing_strategy="latency-based-routing",
routing_strategy_args={"ttl": 25}, # re-evaluate every 25s
)The router pings each endpoint every TTL seconds, then routes new requests to the fastest one. Critical for user-facing apps where p99 matters.
A/B testing
router = Router(model_list=[
{
"model_name": "experimental",
"litellm_params": {"model": "openai/gpt-4o"},
"model_info": {"weight": 0.1}, # 10% traffic
},
{
"model_name": "experimental",
"litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"},
"model_info": {"weight": 0.9}, # 90% traffic
},
])Track quality metrics on responses by model_used (returned in the response) to decide which to graduate.
Async support
import asyncio
async def main():
resp = await router.acompletion(
model="claude-fast",
messages=[{"role": "user", "content": "Hello"}],
)
asyncio.run(main())FAQ
Q: Router vs Proxy — which should I use? A: Router for single Python app (no extra container). Proxy for multi-team / multi-language (any client speaking OpenAI format can use it). Same routing logic, different deployment. Many teams use Router for prod and Proxy for dev/local.
Q: Does Router track costs?
A: Yes — the response includes _response_ms and cost in _hidden_params. For persistent tracking, point Router at a callback (e.g. Langfuse, Helicone, OTEL) — config is one line.
Q: Can I add custom routing logic?
A: Yes — routing_strategy='custom' and pass a callable. Useful for rules like 'always use Claude for queries with PII redaction enabled' or 'route by user tier'.
Source & Thanks
Built by BerriAI. Licensed under MIT.
BerriAI/litellm — ⭐ 17,000+