LiteLLM — Unified Proxy for 100+ LLM APIs
Python SDK and proxy server to call 100+ LLM APIs in OpenAI format. Cost tracking, guardrails, load balancing, logging. Supports Bedrock, Azure, Anthropic, Vertex, and more. 42K+ stars.
What it is
LiteLLM is a Python library and proxy server that provides a unified API interface for calling over 100 LLM providers. You write your code once using the OpenAI SDK format, and LiteLLM translates the call to whichever provider you specify: Anthropic Claude, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, Groq, Together AI, and many more. Switching between providers is a one-line change in the model name.
The library can be used as a Python SDK (direct import), as a standalone proxy server (OpenAI-compatible endpoint), or as a gateway for managing keys, budgets, and rate limits across teams. It is designed for developers and organizations that use multiple LLM providers and want a consistent interface without vendor lock-in.
How it saves time or tokens
Without LiteLLM, calling different providers requires different SDKs, different request formats, and different response parsing. Switching from OpenAI to Claude means rewriting API calls, changing authentication, and adjusting response handling. LiteLLM handles all provider differences behind a single completion() call.
The proxy server mode adds operational value: centralized API key management, per-user budgets, request logging, model fallbacks (try Claude, fall back to GPT-4 if it fails), and load balancing across multiple model deployments. These features save significant engineering time for teams running LLM operations at scale.
How to use
- Install LiteLLM:
```bash
pip install litellm
```
- Use as a Python SDK:
```python
from litellm import completion
# Call Claude
response = completion(
model='anthropic/claude-sonnet-4-20250514',
messages=[{'role': 'user', 'content': 'Hello!'}],
)
# Switch to GPT-4 by changing one line
response = completion(
model='gpt-4o',
messages=[{'role': 'user', 'content': 'Hello!'}],
)
```
- Or run as a proxy server for team-wide access.
Example
Running LiteLLM as a proxy server:
# Start the proxy
litellm --model anthropic/claude-sonnet-4-20250514
# The proxy listens on localhost:4000
# Any OpenAI-compatible client can connect
curl http://localhost:4000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Proxy configuration with fallbacks and load balancing:
# litellm_config.yaml
model_list:
- model_name: default
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: sk-ant-...
- model_name: default
litellm_params:
model: gpt-4o
api_key: sk-...
router_settings:
routing_strategy: simple-shuffle # load balance
num_retries: 2 # fallback on failure
| Feature | SDK Mode | Proxy Mode |
|---|---|---|
| Provider translation | Yes | Yes |
| Streaming | Yes | Yes |
| Fallbacks | Yes | Yes |
| Budget management | No | Yes |
| Key management | No | Yes |
| Request logging | Basic | Full |
| Team access control | No | Yes |
Related on TokRepo
- AI Gateway Tools — Compare LiteLLM with other LLM gateway and proxy solutions.
- AI Gateway: LiteLLM — Deep dive into LiteLLM configuration on TokRepo.
Common pitfalls
- Not setting the correct environment variable for each provider. LiteLLM reads API keys from environment variables:
ANTHROPIC_API_KEYfor Claude,OPENAI_API_KEYfor OpenAI,GEMINI_API_KEYfor Google. Missing keys produce authentication errors that may be confusing when switching providers. - Assuming all providers support all features. Some providers do not support streaming, function calling, vision inputs, or JSON mode. LiteLLM translates what it can, but if a provider does not support a feature, the call fails. Check the provider's capabilities before relying on advanced features.
- Running the proxy without rate limiting in production. Without rate limits, a single client can exhaust your API budget. Configure per-user or per-team budgets in the proxy config to prevent runaway costs.
Frequently Asked Questions
LiteLLM supports over 100 providers including OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Google Vertex AI, Cohere, Mistral, Groq, Together AI, Perplexity, Ollama, vLLM, and many more. The full list is maintained in the LiteLLM documentation and grows with each release.
Yes. The proxy server is fully OpenAI-compatible. Point your existing OpenAI SDK client to http://localhost:4000 instead of api.openai.com, and your code works without modification. The proxy translates the request to whichever backend model you configure.
Yes. Pass stream=True in the completion call, and LiteLLM streams responses from any supported provider. The streaming format follows the OpenAI SSE format, so any client that handles OpenAI streaming works with LiteLLM streams from any provider.
Configure multiple models for the same model_name in the proxy config. If the primary model fails (rate limit, error, timeout), LiteLLM automatically retries with the next model in the list. This provides automatic resilience against provider outages without changing client code.
Yes. The proxy mode is designed for production with features like key management, budget controls, request logging, and automatic retries. Deploy the proxy as a Docker container or systemd service. Many organizations use LiteLLM as their central LLM gateway for team-wide access.
Citations (3)
- LiteLLM GitHub Repository— LiteLLM unified proxy for 100+ LLM providers
- LiteLLM Documentation— LiteLLM documentation and provider support
- LiteLLM Proxy Docs— LiteLLM proxy server configuration
Related on TokRepo
Source & Thanks
Created by BerriAI. Licensed under MIT. BerriAI/litellm — 42,000+ GitHub stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.