KnowledgeMay 8, 2026·4 min read

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 27/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Knowledge
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 1b0d1ab2-1edb-49e1-9853-b02807a64140 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Quick Use

  1. Sign up at platform.deepseek.com → API key
  2. Set OpenAI SDK base_url to https://api.deepseek.com/v1
  3. Use model="deepseek-chat" — drop-in for GPT-4o code

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

🙏

Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets