ScriptsMar 30, 2026·2 min read

LiteLLM — Unified Proxy for 100+ LLM APIs

Python SDK and proxy server to call 100+ LLM APIs in OpenAI format. Cost tracking, guardrails, load balancing, logging. Supports Bedrock, Azure, Anthropic, Vertex, and more. 42K+ stars.

TL;DR
A Python proxy that translates OpenAI-format API calls to 100+ LLM providers for seamless model switching.
§01

What it is

LiteLLM is a Python library and proxy server that provides a unified API interface for calling over 100 LLM providers. You write your code once using the OpenAI SDK format, and LiteLLM translates the call to whichever provider you specify: Anthropic Claude, Google Gemini, Azure OpenAI, AWS Bedrock, Ollama, Groq, Together AI, and many more. Switching between providers is a one-line change in the model name.

The library can be used as a Python SDK (direct import), as a standalone proxy server (OpenAI-compatible endpoint), or as a gateway for managing keys, budgets, and rate limits across teams. It is designed for developers and organizations that use multiple LLM providers and want a consistent interface without vendor lock-in.

§02

How it saves time or tokens

Without LiteLLM, calling different providers requires different SDKs, different request formats, and different response parsing. Switching from OpenAI to Claude means rewriting API calls, changing authentication, and adjusting response handling. LiteLLM handles all provider differences behind a single completion() call.

The proxy server mode adds operational value: centralized API key management, per-user budgets, request logging, model fallbacks (try Claude, fall back to GPT-4 if it fails), and load balancing across multiple model deployments. These features save significant engineering time for teams running LLM operations at scale.

§03

How to use

  1. Install LiteLLM:

```bash

pip install litellm

```

  1. Use as a Python SDK:

```python

from litellm import completion

# Call Claude

response = completion(

model='anthropic/claude-sonnet-4-20250514',

messages=[{'role': 'user', 'content': 'Hello!'}],

)

# Switch to GPT-4 by changing one line

response = completion(

model='gpt-4o',

messages=[{'role': 'user', 'content': 'Hello!'}],

)

```

  1. Or run as a proxy server for team-wide access.
§04

Example

Running LiteLLM as a proxy server:

# Start the proxy
litellm --model anthropic/claude-sonnet-4-20250514

# The proxy listens on localhost:4000
# Any OpenAI-compatible client can connect
curl http://localhost:4000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Proxy configuration with fallbacks and load balancing:

# litellm_config.yaml
model_list:
  - model_name: default
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: sk-ant-...
  - model_name: default
    litellm_params:
      model: gpt-4o
      api_key: sk-...

router_settings:
  routing_strategy: simple-shuffle  # load balance
  num_retries: 2                    # fallback on failure
FeatureSDK ModeProxy Mode
Provider translationYesYes
StreamingYesYes
FallbacksYesYes
Budget managementNoYes
Key managementNoYes
Request loggingBasicFull
Team access controlNoYes
§05

Related on TokRepo

§06

Common pitfalls

  • Not setting the correct environment variable for each provider. LiteLLM reads API keys from environment variables: ANTHROPIC_API_KEY for Claude, OPENAI_API_KEY for OpenAI, GEMINI_API_KEY for Google. Missing keys produce authentication errors that may be confusing when switching providers.
  • Assuming all providers support all features. Some providers do not support streaming, function calling, vision inputs, or JSON mode. LiteLLM translates what it can, but if a provider does not support a feature, the call fails. Check the provider's capabilities before relying on advanced features.
  • Running the proxy without rate limiting in production. Without rate limits, a single client can exhaust your API budget. Configure per-user or per-team budgets in the proxy config to prevent runaway costs.

Frequently Asked Questions

How many LLM providers does LiteLLM support?+

LiteLLM supports over 100 providers including OpenAI, Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Google Vertex AI, Cohere, Mistral, Groq, Together AI, Perplexity, Ollama, vLLM, and many more. The full list is maintained in the LiteLLM documentation and grows with each release.

Can I use LiteLLM with existing OpenAI SDK code?+

Yes. The proxy server is fully OpenAI-compatible. Point your existing OpenAI SDK client to http://localhost:4000 instead of api.openai.com, and your code works without modification. The proxy translates the request to whichever backend model you configure.

Does LiteLLM support streaming responses?+

Yes. Pass stream=True in the completion call, and LiteLLM streams responses from any supported provider. The streaming format follows the OpenAI SSE format, so any client that handles OpenAI streaming works with LiteLLM streams from any provider.

How do fallbacks work?+

Configure multiple models for the same model_name in the proxy config. If the primary model fails (rate limit, error, timeout), LiteLLM automatically retries with the next model in the list. This provides automatic resilience against provider outages without changing client code.

Is LiteLLM suitable for production use?+

Yes. The proxy mode is designed for production with features like key management, budget controls, request logging, and automatic retries. Deploy the proxy as a Docker container or systemd service. Many organizations use LiteLLM as their central LLM gateway for team-wide access.

Citations (3)
🙏

Source & Thanks

Created by BerriAI. Licensed under MIT. BerriAI/litellm — 42,000+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets