# DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality > DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API. ## Install Copy the content below into your project: ## Quick Use 1. Sign up at platform.deepseek.com → API key 2. Set OpenAI SDK base_url to `https://api.deepseek.com/v1` 3. Use `model="deepseek-chat"` — drop-in for GPT-4o code --- ## Intro DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes. --- ### Hosted API (OpenAI-compatible) ```python from openai import OpenAI client = OpenAI( base_url="https://api.deepseek.com/v1", api_key=os.environ["DEEPSEEK_API_KEY"], ) response = client.chat.completions.create( model="deepseek-chat", # alias for DeepSeek-V3 messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}], temperature=0.3, ) print(response.choices[0].message.content) ``` Drop-in for any OpenAI SDK code — switch `base_url` and `model`, everything else works (tool use, JSON mode, streaming). ### Local via Ollama ```bash # Pull a quantized version (full 671B is ~700GB!) ollama pull deepseek-v3:8b # ~5GB, 8B distilled ollama pull deepseek-v3:32b # ~20GB, 32B distilled ollama pull deepseek-v3:671b # ~700GB, full BF16 — needs 8× H100 ``` Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost. ### Local via vLLM (production) ```bash pip install vllm python -m vllm.entrypoints.openai.api_server \ --model deepseek-ai/DeepSeek-V3 \ --tensor-parallel-size 8 \ --gpu-memory-utilization 0.95 ``` Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible. ### Pricing snapshot | Source | Input $/1M tok | Output $/1M tok | |---|---|---| | DeepSeek API | $0.27 | $1.10 | | OpenRouter | $0.27 | $1.10 | | GPT-4o (compare) | $2.50 | $10.00 | | Claude 3.5 Sonnet (compare) | $3.00 | $15.00 | | Local (vLLM) | $0 (after hardware) | $0 | --- ### FAQ **Q: Is DeepSeek-V3 free?** A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies. **Q: Is V3 actually as good as GPT-4o?** A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small. **Q: Are there privacy concerns?** A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal. --- ## Source & Thanks > Built by [DeepSeek](https://github.com/deepseek-ai). Weights MIT-licensed. > > [deepseek-ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) — ⭐ 80,000+ --- ## 快速使用 1. 在 platform.deepseek.com 注册,拿 API key 2. 把 OpenAI SDK base_url 设成 `https://api.deepseek.com/v1` 3. 用 `model="deepseek-chat"`,drop-in 替代 GPT-4o 代码 --- ## 简介 DeepSeek-V3 是 6710 亿参数的 mixture-of-experts 模型,让 DeepSeek 走向世界 —— 多数 benchmark 上跟 GPT-4o 持平,每 token 只激活 370 亿参数。权重 MIT 开源(下载即跑)。托管 API 每百万输入 token $0.27 —— 比 GPT-4o 便宜约 10 倍。适合本来要用 GPT-4o 的成本敏感生产场景。兼容 DeepSeek API(OpenAI 兼容)、Ollama / vLLM / llama.cpp 本地、AWS Bedrock。装机时间 2 分钟。 --- ### 托管 API(OpenAI 兼容) ```python from openai import OpenAI client = OpenAI( base_url="https://api.deepseek.com/v1", api_key=os.environ["DEEPSEEK_API_KEY"], ) response = client.chat.completions.create( model="deepseek-chat", # DeepSeek-V3 别名 messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}], temperature=0.3, ) print(response.choices[0].message.content) ``` 任何 OpenAI SDK 代码 drop-in 替换 —— 切 `base_url` 和 `model`,其他全保留(工具使用、JSON 模式、流式)。 ### 本地 Ollama ```bash # 拉量化版本(完整 671B 约 700GB!) ollama pull deepseek-v3:8b # ~5GB,8B 蒸馏 ollama pull deepseek-v3:32b # ~20GB,32B 蒸馏 ollama pull deepseek-v3:671b # ~700GB,完整 BF16,需要 8× H100 ``` 多数个人用户用 8B 或 32B 蒸馏版本 —— 在爱好者硬件成本上保留了 V3 大部分推理能力。 ### 本地 vLLM(生产) ```bash pip install vllm python -m vllm.entrypoints.openai.api_server \ --model deepseek-ai/DeepSeek-V3 \ --tensor-parallel-size 8 \ --gpu-memory-utilization 0.95 ``` 完整模型需要 8× H100(或等效约 640GB GPU 内存)。API 端点 OpenAI 兼容。 ### 价格快照 | 来源 | 输入 $/1M tok | 输出 $/1M tok | |---|---|---| | DeepSeek API | $0.27 | $1.10 | | OpenRouter | $0.27 | $1.10 | | GPT-4o(对比) | $2.50 | $10.00 | | Claude 3.5 Sonnet(对比) | $3.00 | $15.00 | | 本地(vLLM) | $0(硬件之后) | $0 | --- ### FAQ **Q: DeepSeek-V3 免费吗?** A: 权重:MIT 开源免费。托管 API:付费但便宜(约 $0.27/1M 输入)。本地推理:硬件成本之后免费。多数用户先用托管 API 做原型,量大了切本地或自托管。 **Q: V3 真的跟 GPT-4o 一样好吗?** A: 多数 benchmark(MMLU / GPQA / HumanEval / MATH)差 1-3 分。某些专门任务(视觉、最新新闻)GPT-4o 训练更新或多模态更强,V3 落后。通用推理 + 代码差距很小。 **Q: 有隐私顾虑吗?** A: DeepSeek 托管 API 按隐私政策存 prompt。敏感工作负载在本地或尊重隐私的托管(Together / Fireworks / 自己的 vLLM)跑。MIT 许可证让自托管完全合法。 --- ## 来源与感谢 > Built by [DeepSeek](https://github.com/deepseek-ai). Weights MIT-licensed. > > [deepseek-ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) — ⭐ 80,000+ --- Source: https://tokrepo.com/en/workflows/deepseek-v3-open-weight-671b-moe-model-with-gpt-4o-quality Author: DeepSeek