Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 7, 2026·4 min de lectura

LiteLLM Proxy — Unified Gateway for 100+ LLM APIs

LiteLLM Proxy maps 100+ LLM providers (Anthropic, OpenAI, Bedrock, Vertex) to one OpenAI-compatible endpoint. Auth, rate limit, cost track, fallbacks.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 29/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: Community
Entrada
Asset
Comando de staging seguro
npx -y tokrepo@latest install 0f113965-1adc-4435-982b-fb613fa4d157 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

LiteLLM Proxy is a self-hostable gateway that exposes 100+ LLM providers as one OpenAI-compatible endpoint. Point any OpenAI SDK at the proxy and route to Anthropic / Bedrock / Vertex / Together / Groq / local Ollama with no code changes. Adds team-level auth, rate limits, cost tracking, and fallbacks. Best for: enterprises with multi-team / multi-provider LLM use, or anyone who wants to swap models without rewriting clients. Works with: any OpenAI SDK (Python, Node, Go, etc), Claude Code, Cursor (via custom OpenAI base URL). Setup time: 5 minutes (Docker compose).


Run the proxy

# config.yaml
model_list:
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  - model_name: codestral
    litellm_params:
      model: ollama/codestral
      api_base: http://localhost:11434

general_settings:
  master_key: sk-1234
  database_url: postgresql://...

router_settings:
  routing_strategy: simple-shuffle
  fallbacks:
    - claude-3-5-sonnet: ["gpt-4o"]  # if Claude rate-limits, try GPT
docker run -p 4000:4000 \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -e ANTHROPIC_API_KEY \
  -e OPENAI_API_KEY \
  ghcr.io/berriai/litellm:main-stable \
  --config /app/config.yaml

Use from any OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000",
    api_key="LITELLM_VIRTUAL_KEY_PLACEHOLDER",  # generated via /key/generate
)

# Same API, but routed through LiteLLM
resp = client.chat.completions.create(
    model="claude-3-5-sonnet",   # name from config.yaml
    messages=[{"role": "user", "content": "Hello"}],
)

Per-team budgets and rate limits

# Generate a key for team Acme with $50/mo budget
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-1234" \
  -d '{"team_id": "acme", "max_budget": 50, "rpm_limit": 100}'

# Returns: {"key": "sk-acme-xyz123", ...}

LiteLLM tracks every call per team, blocks once the budget is hit, and exposes a /spend dashboard.

Connect Claude Code / Cursor

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=sk-acme-xyz123

# Cursor — Settings > Custom OpenAI Base URL: http://localhost:4000/v1

FAQ

Q: Is LiteLLM Proxy free? A: Yes — open-source under MIT license. Self-host with Docker for free. BerriAI also offers a hosted version (LiteLLM Cloud) with SLAs and managed observability for teams that don't want to run infra.

Q: How does this differ from OpenRouter or Portkey? A: OpenRouter is a hosted-only proxy. Portkey is a hosted gateway with observability. LiteLLM is the only one that's primarily open-source self-host — full control over auth, routing, and data. All three speak OpenAI format.

Q: Will my non-OpenAI providers really 'just work' through OpenAI SDK? A: Yes for chat/completions. Tool calls work too (LiteLLM normalizes the format). Edge cases: streaming usage stats and provider-specific extensions (Anthropic prompt caching, OpenAI logprobs) need explicit support — most are wired up. Check litellm.ai/docs/providers for your provider.


Quick Use

  1. Save the YAML config below as config.yaml
  2. docker run -p 4000:4000 -v $(pwd)/config.yaml:/app/config.yaml ghcr.io/berriai/litellm:main-stable --config /app/config.yaml
  3. Point any OpenAI SDK at http://localhost:4000

Intro

LiteLLM Proxy is a self-hostable gateway that exposes 100+ LLM providers as one OpenAI-compatible endpoint. Point any OpenAI SDK at the proxy and route to Anthropic / Bedrock / Vertex / Together / Groq / local Ollama with no code changes. Adds team-level auth, rate limits, cost tracking, and fallbacks. Best for: enterprises with multi-team / multi-provider LLM use, or anyone who wants to swap models without rewriting clients. Works with: any OpenAI SDK (Python, Node, Go, etc), Claude Code, Cursor (via custom OpenAI base URL). Setup time: 5 minutes (Docker compose).


Run the proxy

# config.yaml
model_list:
  - model_name: claude-3-5-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  - model_name: codestral
    litellm_params:
      model: ollama/codestral
      api_base: http://localhost:11434

general_settings:
  master_key: sk-1234
  database_url: postgresql://...

router_settings:
  routing_strategy: simple-shuffle
  fallbacks:
    - claude-3-5-sonnet: ["gpt-4o"]  # if Claude rate-limits, try GPT
docker run -p 4000:4000 \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -e ANTHROPIC_API_KEY \
  -e OPENAI_API_KEY \
  ghcr.io/berriai/litellm:main-stable \
  --config /app/config.yaml

Use from any OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000",
    api_key="LITELLM_VIRTUAL_KEY_PLACEHOLDER",  # generated via /key/generate
)

# Same API, but routed through LiteLLM
resp = client.chat.completions.create(
    model="claude-3-5-sonnet",   # name from config.yaml
    messages=[{"role": "user", "content": "Hello"}],
)

Per-team budgets and rate limits

# Generate a key for team Acme with $50/mo budget
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-1234" \
  -d '{"team_id": "acme", "max_budget": 50, "rpm_limit": 100}'

# Returns: {"key": "sk-acme-xyz123", ...}

LiteLLM tracks every call per team, blocks once the budget is hit, and exposes a /spend dashboard.

Connect Claude Code / Cursor

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=sk-acme-xyz123

# Cursor — Settings > Custom OpenAI Base URL: http://localhost:4000/v1

FAQ

Q: Is LiteLLM Proxy free? A: Yes — open-source under MIT license. Self-host with Docker for free. BerriAI also offers a hosted version (LiteLLM Cloud) with SLAs and managed observability for teams that don't want to run infra.

Q: How does this differ from OpenRouter or Portkey? A: OpenRouter is a hosted-only proxy. Portkey is a hosted gateway with observability. LiteLLM is the only one that's primarily open-source self-host — full control over auth, routing, and data. All three speak OpenAI format.

Q: Will my non-OpenAI providers really 'just work' through OpenAI SDK? A: Yes for chat/completions. Tool calls work too (LiteLLM normalizes the format). Edge cases: streaming usage stats and provider-specific extensions (Anthropic prompt caching, OpenAI logprobs) need explicit support — most are wired up. Check litellm.ai/docs/providers for your provider.


Source & Thanks

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

🙏

Fuente y agradecimientos

Built by BerriAI. Licensed under MIT.

BerriAI/litellm — ⭐ 17,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados