How do I install DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

from openai import OpenAI client = OpenAI( base_url="https://api.deepseek.com/v1", api_key=os.environ["DEEPSEEK_API_KEY"], ) response = client.chat.completions.create( model="deepseek-chat", # alias for DeepSeek-V3 messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}], temperature=0.3, ) print(response.choices[0].message.content)

# Pull a quantized version (full 671B is ~700GB!) ollama pull deepseek-v3:8b # ~5GB, 8B distilled ollama pull deepseek-v3:32b # ~20GB, 32B distilled ollama pull deepseek-v3:671b # ~700GB, full BF16 — needs 8× H100

Source

Input $/1M tok

Output $/1M tok

DeepSeek API

$0.27

$1.10

OpenRouter

$0.27

$1.10

GPT-4o (compare)

$2.50

$10.00

Claude 3.5 Sonnet (compare)

$3.00

$15.00

Local (vLLM)

$0 (after hardware)

Quick Use

Sign up at platform.deepseek.com → API key
Set OpenAI SDK base_url to https://api.deepseek.com/v1
Use model="deepseek-chat" — drop-in for GPT-4o code

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.

Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source	Input $/1M tok	Output $/1M tok
DeepSeek API	$0.27	$1.10
OpenRouter	$0.27	$1.10
GPT-4o (compare)	$2.50	$10.00
Claude 3.5 Sonnet (compare)	$3.00	$15.00
Local (vLLM)	$0 (after hardware)	$0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.

Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

Staging seguro para este activo

Hosted API (OpenAI-compatible)

Local via Ollama

Local via vLLM (production)

Pricing snapshot

FAQ

Quick Use

Intro

Hosted API (OpenAI-compatible)

Local via Ollama

Local via vLLM (production)

Pricing snapshot

FAQ

Source & Thanks

Fuente y agradecimientos

Discusión

Activos relacionados

DeepSeek-R1 — Open-Weight Reasoning Model Rivaling OpenAI o1

DeepSeek Coder — Code-Specialized Model for Local Inference

Fireworks Inference — 100+ Open Models on OpenAI-Compat API

Awesome ChatGPT Repos — Open-Source Index