Model Families
Claude (Anthropic)
| Model | Best For | Context | Cost (input/output per M) |
|---|---|---|---|
| Opus 4 | Complex reasoning, architecture | 200K | $15 / $75 |
| Sonnet 4 | Daily coding, best value | 200K | $3 / $15 |
| Haiku | Fast tasks, classification | 200K | $0.25 / $1.25 |
Strengths: Best at following complex instructions, careful code generation, long-form reasoning. Weaknesses: No image generation, slower than GPT-4o for simple tasks.
GPT (OpenAI)
| Model | Best For | Context | Cost (input/output per M) |
|---|---|---|---|
| o3 | Complex math, science | 200K | $10 / $40 |
| GPT-4o | General purpose, fast | 128K | $2.50 / $10 |
| GPT-4o mini | Budget tasks | 128K | $0.15 / $0.60 |
Strengths: Fastest responses, broadest training data, best image understanding. Weaknesses: Less careful than Claude for complex code, more likely to hallucinate.
Gemini (Google)
| Model | Best For | Context | Cost (input/output per M) |
|---|---|---|---|
| 2.5 Pro | Long docs, research | 1M | $1.25 / $5 |
| 2.5 Flash | Speed-critical tasks | 1M | $0.075 / $0.30 |
Strengths: Massive context window (1M tokens), cheapest per token, multimodal. Weaknesses: Less reliable at following complex instructions, occasional refusals.
Open-Source
| Model | Best For | Parameters | License |
|---|---|---|---|
| Llama 3.1 | General, self-hosted | 8B / 70B / 405B | Meta License |
| Mistral Large | European compliance | 123B | Apache 2.0 |
| Codestral | Code completion | 22B | Custom |
| DeepSeek V3 | Budget alternative | 671B MoE | MIT |
| Qwen 2.5 | Multilingual, math | 72B | Apache 2.0 |
Run locally with Ollama: ollama run llama3.1:70b
Decision Framework
By Task Type
Coding (complex refactoring, architecture):
- Claude Sonnet 4 (best value)
- Claude Opus 4 (highest quality)
- GPT-4o (fastest)
Coding (autocomplete, inline suggestions):
- Codestral (specialized)
- GPT-4o mini (cheapest)
- Claude Haiku (fast + capable)
RAG / Document Q&A:
- Gemini 2.5 Pro (1M context)
- Claude Sonnet 4 (best instruction following)
- GPT-4o (good balance)
Data Analysis:
- Claude Opus 4 (careful reasoning)
- o3 (math-heavy tasks)
- GPT-4o (visualization descriptions)
Chat / Customer Support:
- Claude Haiku (fast, cheap, good)
- GPT-4o mini (cheapest)
- Gemini Flash (very cheap)
By Budget
| Monthly Budget | Recommendation |
|---|---|
| <$10 | GPT-4o mini or Gemini Flash |
| $10-50 | Claude Sonnet 4 |
| $50-200 | Claude Sonnet (daily) + Opus (complex) |
| $200+ | Claude Opus for everything |
| $0 (local) | Llama 3.1 70B via Ollama |
By Privacy Requirements
| Requirement | Recommendation |
|---|---|
| Data stays local | Ollama + Llama/Mistral |
| No data training | Claude (no training on API data) |
| EU data residency | Mistral (EU-hosted) |
| HIPAA compliance | Azure OpenAI or Claude API |
Cost Comparison (per 1M tokens)
| Model | Input | Output | Relative Cost |
|---|---|---|---|
| Gemini Flash | $0.075 | $0.30 | $ (cheapest) |
| GPT-4o mini | $0.15 | $0.60 | $ |
| Claude Haiku | $0.25 | $1.25 | $ |
| Gemini Pro | $1.25 | $5.00 | $$ |
| GPT-4o | $2.50 | $10.00 | $$$ |
| Claude Sonnet | $3.00 | $15.00 | $$$ |
| o3 | $10.00 | $40.00 | $$$$ |
| Claude Opus | $15.00 | $75.00 | $$$$$ |
FAQ
Q: Which model is best overall? A: Claude Sonnet 4 for the best quality/cost ratio. GPT-4o if speed matters most. Gemini Pro if you need 1M context.
Q: Should I use one model or multiple? A: Use multiple. Route simple tasks to cheap models (Haiku/4o-mini) and complex tasks to capable models (Sonnet/Opus).
Q: Are open-source models good enough? A: Llama 3.1 70B is competitive with GPT-4 for many tasks. For the best quality, cloud models still lead.