为什么选它
LiteLLM is the "one SDK for every LLM" answer, plus a full Proxy server for teams that want a hosted gateway they control. The SDK alone normalizes inputs and outputs: completion(model="claude-3-5-sonnet", messages=[...]) works identically to the OpenAI call. The Proxy adds routing, budgets, key management, logging, and a Swagger UI.
It’s the most popular OSS gateway (25K+ GitHub stars) and the standard reference for framework-agnostic multi-model access. LangChain, LlamaIndex, and CrewAI all support LiteLLM as a model provider out of the box. If you’ve read "point it at any OpenAI-compatible endpoint" in a dozen READMEs, LiteLLM is how most of those setups work.
What you give up: polish. The dashboard exists but is functional, not beautiful. Observability is present but not deep — most teams pair LiteLLM Proxy with Langfuse or Helicone for traces. For the free-and-open price, you trade UX for control.
Quick Start — SDK or Proxy
The SDK is the fastest path to multi-provider support — no server to run. The Proxy is a small FastAPI server that exposes OpenAI-compatible endpoints; point any OpenAI SDK at it. Config-driven routing means you change providers or load-balance strategies without touching app code.
# Option A: SDK only (no server needed)
# pip install litellm
from litellm import completion
resp = completion(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello from LiteLLM"}],
)
print(resp.choices[0].message.content)
# Option B: Run the Proxy for team use
# pip install 'litellm[proxy]'
# litellm --config config.yaml --port 4000
#
# config.yaml:
# model_list:
# - model_name: fast
# litellm_params:
# model: gpt-4o-mini
# api_key: os.environ/OPENAI_KEY
# - model_name: fast
# litellm_params:
# model: claude-3-5-haiku-20241022
# api_key: os.environ/ANTHROPIC_KEY
# router_settings:
# routing_strategy: usage-based-routing-v2
# Now call the proxy as if it were OpenAI
from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000", api_key="sk-proxy-token")
r = client.chat.completions.create(model="fast", messages=[{"role":"user","content":"hi"}])
# Proxy load-balances between gpt-4o-mini and claude-3-5-haiku based on usage.核心能力
100+ providers
OpenAI, Anthropic, Gemini, Bedrock, Azure, Vertex, Ollama, Together, Fireworks, Anyscale, Groq, Mistral, Cohere, HuggingFace, and many more. All through the same completion() signature.
Proxy server
Production-grade FastAPI server: routing, load-balancing, retries, caching, key management, and user budgets. Deploy with Docker; expose as an internal OpenAI-compatible endpoint.
Budgets & rate limits
Per-user, per-team, per-key budgets enforced at the Proxy. Alerts on 80% / 100% spend. Essential for multi-tenant or internal platform-as-a-service setups.
Langfuse / Helicone / Sentry hooks
Native callback integrations. Pair LiteLLM Proxy with Langfuse for traces, Helicone for observability, Sentry for errors. Configure via proxy YAML.
Fallback & retry
Declarative fallback lists: try Claude, fall back to GPT-4o, then to gpt-4o-mini. Exponential backoff built in. Configurable per route.
Custom auth & RBAC
Proxy generates virtual keys per user; role-based access controls which models and budgets each user can hit. Integrates with your existing SSO via OIDC.
对比
| License | Deployment | Dashboard | Best For | |
|---|---|---|---|---|
| LiteLLMthis | MIT (SDK) + proxy | Self-host | Functional | Teams wanting OSS gateway + unified SDK |
| Portkey | Gateway Apache 2.0; cloud proprietary | Managed + self-host | Polished | Teams wanting managed UX |
| OpenRouter | Proprietary | Managed only | Web UI | Quick multi-model experiments |
| Cloudflare AI Gateway | Proprietary | Managed only | Web UI | Edge caching, simple setup |
实际用例
01. Internal AI platform
Platform team runs LiteLLM Proxy; product teams hit one OpenAI-compatible endpoint. Central control over providers, keys, budgets; no central code deploys when a team wants a new model.
02. Multi-model apps
Agents that route between fast/cheap and slow/powerful models. LiteLLM’s unified completion() signature means the routing logic is 10 lines, not an integration per provider.
03. Local + cloud hybrid
Use Ollama for dev and cheap inference, OpenAI/Claude for production. Same code path — switch via the model name.
价格与许可
LiteLLM: MIT license, free. No enterprise support SKU — the project is maintained by BerriAI and a growing community. For commercial support, litellm.ai offers hosted and enterprise tiers with SLAs.
Operational cost: small VM for the Proxy (2 vCPU / 4GB handles thousands of RPS in practice), plus your underlying LLM spend. No per-request gateway fees.
What you pay in hidden complexity: self-hosting means you own uptime, upgrades, and debugging. For teams that want "pay and forget", Portkey or Cloudflare lower the ops burden at the cost of license-free freedom.
相关 TokRepo 资产
LLM Gateway Comparison — Proxy Your AI Requests
Compare top LLM gateway and proxy tools for routing AI requests. Covers LiteLLM, Bifrost, Portkey, and OpenRouter for cost optimization, failover, and multi-provider access.
LiteLLM — Unified Proxy for 100+ LLM APIs
Python SDK and proxy server to call 100+ LLM APIs in OpenAI format. Cost tracking, guardrails, load balancing, logging. Supports Bedrock, Azure, Anthropic, Vertex, and more. 42K+ stars.
LiteLLM — Universal LLM API Gateway, 100+ Providers
Unified API proxy for 100+ LLM providers including OpenAI, Anthropic, Bedrock, Azure, and Vertex AI. Drop-in OpenAI replacement with load balancing and spend tracking. 18,000+ GitHub stars.
LLM Gateway Comparison — LiteLLM vs OpenRouter vs CF
In-depth comparison of LLM API gateways: LiteLLM (self-hosted proxy), OpenRouter (unified API), and Cloudflare AI Gateway (edge cache). Architecture, pricing, and when to use each.
常见问题
LiteLLM SDK vs LiteLLM Proxy — which do I need?+
SDK for single apps: you want unified completion() calls, no server. Proxy for teams / internal platform: multiple apps share the gateway, centralized keys and budgets, OpenAI-compatible endpoint for tools that want one.
Does LiteLLM add latency?+
SDK: ~0 (in-process). Proxy: 3-10ms hot-path overhead. Caching and load-balancing can save far more than they cost on realistic traffic.
How does LiteLLM compare to OpenRouter?+
OpenRouter is a managed SaaS with pay-per-token pricing across providers. LiteLLM is self-hosted with BYO-keys. Use OpenRouter for fast experimentation or when you want one invoice; use LiteLLM when you want control over keys, budgets, and data flow.
Is LiteLLM production-ready?+
Yes — deployed in production by many large organizations. Check the GitHub README for active adopter list. Expected caveats: watch the changelog for occasional breaks during rapid development; upgrade in staging before production.
Does it work with Claude Code / Cursor / Cline?+
Yes. Any tool that accepts an OpenAI-compatible endpoint (base URL + API key) works. Point Cursor or Cline at your LiteLLM Proxy, and the tool’s "OpenAI" integration now routes through your multi-provider gateway.
How do I add a new provider?+
LiteLLM’s /providers list covers most mainstream LLMs. For new or custom ones, register a generic OpenAI-compatible endpoint in the model_list config — no code change needed.