Cette page est affichée en anglais. Une traduction française est en cours.
KnowledgeMay 8, 2026·4 min de lecture

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

DeepSeek
DeepSeek · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 15/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Knowledge
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 1b0d1ab2-1edb-49e1-9853-b02807a64140
Introduction

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Quick Use

  1. Sign up at platform.deepseek.com → API key
  2. Set OpenAI SDK base_url to https://api.deepseek.com/v1
  3. Use model="deepseek-chat" — drop-in for GPT-4o code

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

🙏

Source et remerciements

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires