Esta página se muestra en inglés. Una traducción al español está en curso.
KnowledgeMay 8, 2026·4 min de lectura

DeepSeek-V3 — Open-Weight 671B MoE Model with GPT-4o Quality

DeepSeek-V3 is a 671B-param MoE model (37B active per token). Matches GPT-4o on benchmarks. MIT-licensed weights, $0.27/1M input on the hosted API.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 27/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Knowledge
Instalación
Stage only
Confianza
Confianza: Community
Entrada
Asset
Comando de staging seguro
npx -y tokrepo@latest install 1b0d1ab2-1edb-49e1-9853-b02807a64140 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Quick Use

  1. Sign up at platform.deepseek.com → API key
  2. Set OpenAI SDK base_url to https://api.deepseek.com/v1
  3. Use model="deepseek-chat" — drop-in for GPT-4o code

Intro

DeepSeek-V3 is the 671B-parameter mixture-of-experts model that put DeepSeek on the global map — matches GPT-4o on most benchmarks while activating only 37B params per token. Weights are MIT-licensed (download and run anywhere). The hosted API costs $0.27 per 1M input tokens — about 10× cheaper than GPT-4o. Best for: cost-sensitive production where you'd otherwise use GPT-4o. Works with: DeepSeek API (OpenAI-compatible), local via Ollama / vLLM / llama.cpp, AWS Bedrock. Setup time: 2 minutes.


Hosted API (OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.deepseek.com/v1",
    api_key=os.environ["DEEPSEEK_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-chat",  # alias for DeepSeek-V3
    messages=[{"role": "user", "content": "Compare LFP vs NMC battery chemistries"}],
    temperature=0.3,
)

print(response.choices[0].message.content)

Drop-in for any OpenAI SDK code — switch base_url and model, everything else works (tool use, JSON mode, streaming).

Local via Ollama

# Pull a quantized version (full 671B is ~700GB!)
ollama pull deepseek-v3:8b      # ~5GB, 8B distilled
ollama pull deepseek-v3:32b     # ~20GB, 32B distilled
ollama pull deepseek-v3:671b    # ~700GB, full BF16 — needs 8× H100

Most personal users want the 8B or 32B distilled variants — they capture much of V3's reasoning at hobbyist hardware cost.

Local via vLLM (production)

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model deepseek-ai/DeepSeek-V3 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.95

Requires 8× H100 (or equivalent ~640GB GPU memory) for the full model. The API endpoint is OpenAI-compatible.

Pricing snapshot

Source Input $/1M tok Output $/1M tok
DeepSeek API $0.27 $1.10
OpenRouter $0.27 $1.10
GPT-4o (compare) $2.50 $10.00
Claude 3.5 Sonnet (compare) $3.00 $15.00
Local (vLLM) $0 (after hardware) $0

FAQ

Q: Is DeepSeek-V3 free? A: Weights: yes, MIT-licensed. Hosted API: paid but cheap (~$0.27/1M input). Local inference: free after you cover the hardware. Most users start with hosted API for prototyping, switch to local or self-host once volume justifies.

Q: Is V3 actually as good as GPT-4o? A: On most benchmarks (MMLU, GPQA, HumanEval, MATH) it's within 1-3 points. Some specialized tasks (vision, latest news) where GPT-4o has more recent training or modalities, V3 lags. For general reasoning + code, the gap is small.

Q: Are there privacy concerns? A: DeepSeek's hosted API stores prompts per their privacy policy. For sensitive workloads, run locally or via a privacy-respecting host (Together, Fireworks, your own vLLM). The MIT license makes self-hosting fully legal.


Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

🙏

Fuente y agradecimientos

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-V3 — ⭐ 80,000+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados