KnowledgeMay 8, 2026·4 min read

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 15/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Knowledge
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 1b0d1ab2-1edb-49e1-9853-b02807a64140
Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Quick Use

  1. Sign up at platform.deepseek.com → API key
  2. Set OpenAI SDK base_url to https://api.deepseek.com/v1
  3. Use model="deepseek-chat" — drop-in for GPT-4o code

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

🙏

Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets