# DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

> DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

## Install

Copy the content below into your project:

## Quick Use

1. Sign up at platform.deepseek.com → API key
2. Set OpenAI SDK base_url to `https://api.deepseek.com/v1`
3. Use `model="deepseek-chat"` — drop-in for GPT-4o code

---

## Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.

---

### Hosted API (OpenAI-compatible)

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)
```

Drop-in for any OpenAI SDK code — switch `base_url` and `model`, everything else works (tool use, JSON mode, streaming).

### Local via Ollama

```bash
# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100
```

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

### Local via vLLM (production)

```bash
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95
```

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

### Pricing snapshot

| Source | Input $/1M tok | Output $/1M tok |
|---|---|---|
| DeepSeek API | $0.27 | $1.10 |
| OpenRouter | $0.27 | $1.10 |
| GPT-4o (compare) | $2.50 | $10.00 |
| Claude 3.5 Sonnet (compare) | $3.00 | $15.00 |
| Local (vLLM) | $0 (after hardware) | $0 |

---

### FAQ

**Q: Is DeepSeek-V3 free?**
A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

**Q: Is V3 actually as good as GPT-4o?**
A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

**Q: Are there privacy concerns?**
A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.

---

## Source & Thanks

> Built by [DeepSeek](https://github.com/deepseek-ai). Weights MIT-licensed.
>
> [deepseek-ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) — ⭐ 80,000+

---

<!-- ZH -->

## 快速使用

1. 在 platform.deepseek.com 注册，拿 API key
2. 把 OpenAI SDK base_url 设成 `https://api.deepseek.com/v1`
3. 用 `model="deepseek-chat"`，drop-in 替代 GPT-4o 代码

---

## 简介

DeepSeek-V3 是 6710 亿参数的 mixture-of-experts 模型，让 DeepSeek 走向世界 —— 多数 benchmark 上跟 GPT-4o 持平，每 token 只激活 370 亿参数。权重 MIT 开源（下载即跑）。托管 API 每百万输入 token $0.27 —— 比 GPT-4o 便宜约 10 倍。适合本来要用 GPT-4o 的成本敏感生产场景。兼容 DeepSeek API（OpenAI 兼容）、Ollama / vLLM / llama.cpp 本地、AWS Bedrock。装机时间 2 分钟。

---

### 托管 API（OpenAI 兼容）

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # DeepSeek-V3 别名
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)
```

任何 OpenAI SDK 代码 drop-in 替换 —— 切 `base_url` 和 `model`，其他全保留（工具使用、JSON 模式、流式）。

### 本地 Ollama

```bash
# 拉量化版本（完整 671B 约 700GB！）
ollama pull deepseek-v3:8b      # ~5GB，8B 蒸馏
ollama pull deepseek-v3:32b     # ~20GB，32B 蒸馏
ollama pull deepseek-v3:671b    # ~700GB，完整 BF16，需要 8× H100
```

多数个人用户用 8B 或 32B 蒸馏版本 —— 在爱好者硬件成本上保留了 V3 大部分推理能力。

### 本地 vLLM（生产）

```bash
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95
```

完整模型需要 8× H100（或等效约 640GB GPU 内存）。API 端点 OpenAI 兼容。

### 价格快照

| 来源 | 输入 $/1M tok | 输出 $/1M tok |
|---|---|---|
| DeepSeek API | $0.27 | $1.10 |
| OpenRouter | $0.27 | $1.10 |
| GPT-4o（对比） | $2.50 | $10.00 |
| Claude 3.5 Sonnet（对比） | $3.00 | $15.00 |
| 本地（vLLM） | $0（硬件之后） | $0 |

---

### FAQ

**Q: DeepSeek-V3 免费吗？**
A: 权重：MIT 开源免费。托管 API：付费但便宜（约 $0.27/1M 输入）。本地推理：硬件成本之后免费。多数用户先用托管 API 做原型，量大了切本地或自托管。

**Q: V3 真的跟 GPT-4o 一样好吗？**
A: 多数 benchmark（MMLU / GPQA / HumanEval / MATH）差 1-3 分。某些专门任务（视觉、最新新闻）GPT-4o 训练更新或多模态更强，V3 落后。通用推理 + 代码差距很小。

**Q: 有隐私顾虑吗？**
A: DeepSeek 托管 API 按隐私政策存 prompt。敏感工作负载在本地或尊重隐私的托管（Together / Fireworks / 自己的 vLLM）跑。MIT 许可证让自托管完全合法。

---

## 来源与感谢

> Built by [DeepSeek](https://github.com/deepseek-ai). Weights MIT-licensed.
>
> [deepseek-ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) — ⭐ 80,000+


---
Source: https://tokrepo.com/en/workflows/deepseek-v3-open-weight-671b-moe-model-with-gpt-4o-quality
Author: DeepSeek