Pal MCP Server — Multi-Model AI Gateway for Claude Code
MCP server that lets Claude Code use Gemini, OpenAI, Grok, and Ollama as a unified AI dev team. Features model routing, CLI-to-CLI bridge, and conversation continuity across 7+ providers.
这个资产会安全暂存
这个资产会先安全暂存。复制的指令会要求 Agent 读取暂存文件,并在激活脚本、MCP 配置或全局配置前先确认。
npx -y tokrepo@latest install 09c904b2-4bf7-4f1e-acf5-55cd465b6227 --target codex先暂存文件;激活前需要读取暂存 README 和安装计划。
Why multi-model matters inside one agent
Claude Code is excellent at reasoning. Gemini 2.5 Pro has a 2M context window. GPT-4o is fast. Grok has live web access. Ollama runs offline. A real dev team uses all of them. Pal MCP collapses that into one tool call from Claude Code's perspective — ask it to "call Gemini on this 1.5M-token codebase" and Pal routes the request, returns the result, and maintains conversation continuity.
Single-config setup
Add to .mcp.json:
{
"mcpServers": {
"pal": {
"command": "uvx",
"args": ["--from", "git+https://github.com/BeehiveInnovations/pal-mcp-server.git", "pal-mcp-server"],
"env": {
"GEMINI_API_KEY": "your-gemini-key",
"OPENAI_API_KEY": "your-openai-key",
"DEFAULT_MODEL": "auto"
}
}
}
}
Restart Claude Code. Now pal_chat, pal_route, and pal_continue are callable.
The routing logic
Set DEFAULT_MODEL=auto and Pal picks a model based on task heuristics:
| Task signal | Routed model | Why |
|---|---|---|
| Context > 200K tokens | Gemini 2.5 Pro | 2M context window |
| Needs live web facts | Grok | Twitter/X integration |
| Code completion loops | Ollama Codellama | Free, fast, local |
| Long reasoning chains | o3-preview | Best deliberation |
| Default | Claude Sonnet | Quality baseline |
Override per-call with pal_chat(model="gpt-4o").
CLI-to-CLI bridge
Pal exposes a raw CLI bridge: call Aider, Continue, or any CLI-based agent from within Claude Code. Useful for chaining specialized agents in a single workflow.
Conversation continuity
Every Pal call can continue an existing thread:
pal_continue(thread_id="xyz", prompt="refactor based on Gemini's suggestions")
Thread state is persisted in SQLite under ~/.pal/threads.db. Survives restarts.
Supported providers in 2026
- Anthropic (Claude Opus, Sonnet, Haiku)
- OpenAI (GPT-4o, o3, o3-mini)
- Google (Gemini 2.5 Pro, Flash)
- xAI (Grok-3)
- DeepSeek (R1, V3)
- Ollama (local, 50+ models)
- LiteLLM (proxy for 100+ more)
Cost control
Pal emits a cost-summary per session: total tokens, per-model breakdown, $ estimate. Use MAX_COST_PER_SESSION=5 env var to hard-stop runaway loops.
When Pal is not the right choice
- Single-model workflows — overhead not worth it, use the provider SDK directly.
- Production agents — MCP is still evolving; use LiteLLM Proxy for production-grade routing.
- Compliance-regulated environments — each upstream provider has different data policies; Pal doesn't unify compliance.
常见问题
LiteLLM is a Python proxy library designed for production backends. Pal is an MCP server designed for interactive use inside agents like Claude Code. Pal adds thread continuity and CLI bridging that LiteLLM does not provide, but LiteLLM has stronger production-grade features like retries and load balancing.
Yes. Ollama is a first-class provider. Point Pal at your local Ollama instance with OLLAMA_BASE_URL and it will route appropriate tasks to your local models. Useful for offline work or privacy-sensitive data.
Yes. Any MCP-compatible client works: Cursor, Codex CLI, Zed, Cline, and others. The MCP protocol is standardized so Pal behaves identically across them.
Yes. Set MAX_COST_PER_SESSION environment variable to hard-stop sessions that exceed the limit. Pal also emits a per-call cost summary so you can track spending in real time.
With DEFAULT_MODEL=auto, Pal picks based on task heuristics — Gemini for huge context, Grok for live web facts, Ollama for local code completion, o3 for long reasoning, Claude Sonnet as the quality baseline.
引用来源 (3)
- Pal MCP GitHub— Supports Gemini, OpenAI, Grok, DeepSeek, Ollama, and LiteLLM proxy
- Model Context Protocol— MCP protocol specification by Anthropic
- Google DeepMind— Gemini 2.5 Pro has a 2M token context window
来源与感谢
Created by BeehiveInnovations. Licensed under custom license.
pal-mcp-server — ⭐ 11,300+
讨论
相关资产
Linear MCP — Project Management for Claude Code & Cursor
Linear's official MCP lets Claude Code, Cursor, Codex CLI manage Linear issues, projects, cycles. Search by status, create issues from chat, link PRs.
OpenRouter MCP — One Server for 300+ LLMs in Claude Code
OpenRouter MCP exposes all 300+ OpenRouter models to Claude Code, Cursor, Codex CLI as one MCP server. Switch models per task, BYO routing, no extra SDKs.
MCP SSH Manager — Remote Ops via Claude/Codex
MCP SSH Manager is an MCP server that lets Claude Code and OpenAI Codex manage SSH sessions: run commands, sync files, and automate DevOps routines.
pentest-ai — Offensive Security MCP for Claude Code
pentest-ai is a Python CLI and MCP server that lets Claude Code run verified probes, chain attack paths, and export reports for authorized testing.