Quick Use
- Hosted: same DeepSeek API key, set
model="deepseek-reasoner" - Local:
ollama pull deepseek-r1:7b && ollama run deepseek-r1:7b - Print
response.message.reasoning_contentto see the full chain-of-thought
Intro
DeepSeek-R1 is the open-weight reasoning model that achieves o1-level performance on AIME / MATH / GPQA / Codeforces while shipping its full chain-of-thought to the user. Distilled smaller versions (1.5B, 7B, 32B, 70B) make local reasoning practical on consumer hardware. MIT license, full weights public. Best for: hard reasoning tasks (math, science, complex code) where you need a reasoning model but want open weights. Works with: DeepSeek API, Ollama (distilled), vLLM, llama.cpp. Setup time: 2 minutes.
Hosted API
from openai import OpenAI
client = OpenAI(
base_url="https://api.deepseek.com/v1",
api_key=os.environ["DEEPSEEK_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek-reasoner", # R1
messages=[{"role": "user", "content":
"Prove that the square root of 2 is irrational"}],
)
# R1 streams reasoning + final answer
for choice in response.choices:
print("REASONING:", choice.message.reasoning_content)
print("ANSWER:", choice.message.content)Unlike o1, R1's reasoning is visible — useful for debugging, education, and trust.
Local via Ollama (distilled)
ollama pull deepseek-r1:1.5b # ~1GB, runs on a laptop
ollama pull deepseek-r1:7b # ~5GB
ollama pull deepseek-r1:14b # ~9GB
ollama pull deepseek-r1:32b # ~20GB, M2 Max territory
ollama pull deepseek-r1:70b # ~40GB, beefy serverThe 7B distillation often outperforms GPT-4o on competition math while being free and fast on a single 4090.
When to use R1 vs V3
| Task | Pick |
|---|---|
| Math proofs, competition problems | R1 |
| Step-by-step debugging | R1 |
| Quick chitchat, summaries | V3 (cheaper, faster) |
| Tool-use heavy agent | V3 (R1's tool support is weaker) |
| Need visible CoT for audit | R1 |
Pricing
| Source | Input $/1M tok | Output $/1M tok |
|---|---|---|
| DeepSeek API | $0.55 | $2.19 |
| OpenAI o1 (compare) | $15.00 | $60.00 |
| OpenAI o1-mini (compare) | $3.00 | $12.00 |
| Local distilled | $0 | $0 |
FAQ
Q: Why does R1 show its reasoning when o1 hides it? A: DeepSeek published the full RL training methodology. Visible CoT is part of the value proposition — auditability, debugging, education. OpenAI considers o1's CoT a competitive moat.
Q: How much slower is R1 vs V3? A: R1 spends extra tokens on reasoning before the final answer — typically 3-10× more output tokens, so 3-10× slower wall-clock latency on equal infra. The cost difference reflects this.
Q: Are the distilled R1 versions trained from scratch?
A: No — they're knowledge-distilled from full R1 into Llama / Qwen base models. The 7B distill is Llama-3.1-8B + R1 distillation, the 32B is Qwen-2.5-32B + R1 distillation, etc. Performance trades off with base.
Source & Thanks
Built by DeepSeek. Weights MIT-licensed.
deepseek-ai/DeepSeek-R1 — ⭐ 90,000+