KnowledgeMay 8, 2026·4 min read

DeepSeek Coder — Code-Specialized Model for Local Inference

DeepSeek Coder is the code-specialized open-weight model with FIM (fill-in-middle) support. Beats Codestral on HumanEval. Drops into Continue, Aider.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 27/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Knowledge
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 08acf3a7-b56b-40d2-9c94-9a8eb773eca4 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

DeepSeek Coder is the code-specialized open-weight model — trained on 2T tokens of code across 100+ languages, with native fill-in-middle (FIM) support for tab autocomplete. Outperforms Codestral and matches GPT-4o on HumanEval and MBPP at a fraction of the cost. Best for: local tab autocomplete via Continue / Cursor's local mode, and code-heavy production agents that need cheap inference. Works with: Ollama, vLLM, llama.cpp, DeepSeek API, Continue, Aider. Setup time: 2 minutes.


Local with Ollama

ollama pull deepseek-coder:6.7b   # ~4GB, fits on most laptops
ollama pull deepseek-coder:33b    # ~20GB, M3 Pro / 4090 territory

# Quick test
ollama run deepseek-coder:6.7b
> Write a Rust function that returns the Nth Fibonacci with memoization.

Use as tab autocomplete in Continue

// Continue's config.json
{
  "tabAutocompleteModel": {
    "title": "DeepSeek Coder",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b",
    "apiBase": "http://localhost:11434"
  },
  "models": [
    {
      "title": "DeepSeek Coder Chat",
      "provider": "ollama",
      "model": "deepseek-coder:33b"
    }
  ]
}

Use with Aider

# Hosted
export DEEPSEEK_API_KEY=sk-...
aider --model deepseek/deepseek-coder

# Local (BYOK Ollama)
aider --model ollama/deepseek-coder:33b

Fill-in-middle (FIM) format

DeepSeek Coder's tab-completion uses a specific FIM format:

<|fim_begin|>{prefix}<|fim_hole|>{suffix}<|fim_end|>

Continue / Aider / Cursor handle this automatically. If you're integrating manually, use the FIM tokens — completions are 10-30% better than naive prompting.

Pricing & versions

Variant Params RAM (4-bit) HumanEval Pass@1
deepseek-coder:1.3b 1.3B ~1GB ~38%
deepseek-coder:6.7b 6.7B ~4GB ~58%
deepseek-coder:33b 33B ~20GB ~76%
deepseek-coder-v2:236b (MoE) 236B (21B active) API only ~86%
GPT-4o (compare) API only ~90%

Hosted API: $0.14 / 1M input tokens — cheapest production-quality coder model.


FAQ

Q: Coder vs full DeepSeek-V3 for coding? A: Coder is smaller, faster, cheaper, FIM-aware — best for local autocomplete and quick code questions. V3 is bigger, broader, better at long-context reasoning across files. For tab autocomplete: Coder. For 'understand my whole repo and refactor': V3.

Q: Can I fine-tune DeepSeek Coder? A: Yes — open weights mean any standard LoRA / QLoRA tooling (axolotl, unsloth, trl) works. The 6.7B variant LoRAs are practical on a single 24GB GPU.

Q: Is the V2 MoE coder available locally? A: The V2 236B MoE has open weights but the size makes it impractical for single-machine local. Use it via DeepSeek API or rent GPU time on Together / Fireworks. The 33B dense version is the local-friendly sweet spot.


Quick Use

  1. Local: ollama pull deepseek-coder:6.7b
  2. Configure Continue / Aider / Cursor to use the local model
  3. Or use hosted API with model="deepseek-coder"

Intro

DeepSeek Coder is the code-specialized open-weight model — trained on 2T tokens of code across 100+ languages, with native fill-in-middle (FIM) support for tab autocomplete. Outperforms Codestral and matches GPT-4o on HumanEval and MBPP at a fraction of the cost. Best for: local tab autocomplete via Continue / Cursor's local mode, and code-heavy production agents that need cheap inference. Works with: Ollama, vLLM, llama.cpp, DeepSeek API, Continue, Aider. Setup time: 2 minutes.


Local with Ollama

ollama pull deepseek-coder:6.7b   # ~4GB, fits on most laptops
ollama pull deepseek-coder:33b    # ~20GB, M3 Pro / 4090 territory

# Quick test
ollama run deepseek-coder:6.7b
> Write a Rust function that returns the Nth Fibonacci with memoization.

Use as tab autocomplete in Continue

// Continue's config.json
{
  "tabAutocompleteModel": {
    "title": "DeepSeek Coder",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b",
    "apiBase": "http://localhost:11434"
  },
  "models": [
    {
      "title": "DeepSeek Coder Chat",
      "provider": "ollama",
      "model": "deepseek-coder:33b"
    }
  ]
}

Use with Aider

# Hosted
export DEEPSEEK_API_KEY=sk-...
aider --model deepseek/deepseek-coder

# Local (BYOK Ollama)
aider --model ollama/deepseek-coder:33b

Fill-in-middle (FIM) format

DeepSeek Coder's tab-completion uses a specific FIM format:

<|fim_begin|>{prefix}<|fim_hole|>{suffix}<|fim_end|>

Continue / Aider / Cursor handle this automatically. If you're integrating manually, use the FIM tokens — completions are 10-30% better than naive prompting.

Pricing & versions

Variant Params RAM (4-bit) HumanEval Pass@1
deepseek-coder:1.3b 1.3B ~1GB ~38%
deepseek-coder:6.7b 6.7B ~4GB ~58%
deepseek-coder:33b 33B ~20GB ~76%
deepseek-coder-v2:236b (MoE) 236B (21B active) API only ~86%
GPT-4o (compare) API only ~90%

Hosted API: $0.14 / 1M input tokens — cheapest production-quality coder model.


FAQ

Q: Coder vs full DeepSeek-V3 for coding? A: Coder is smaller, faster, cheaper, FIM-aware — best for local autocomplete and quick code questions. V3 is bigger, broader, better at long-context reasoning across files. For tab autocomplete: Coder. For 'understand my whole repo and refactor': V3.

Q: Can I fine-tune DeepSeek Coder? A: Yes — open weights mean any standard LoRA / QLoRA tooling (axolotl, unsloth, trl) works. The 6.7B variant LoRAs are practical on a single 24GB GPU.

Q: Is the V2 MoE coder available locally? A: The V2 236B MoE has open weights but the size makes it impractical for single-machine local. Use it via DeepSeek API or rent GPU time on Together / Fireworks. The 33B dense version is the local-friendly sweet spot.


Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-Coder — ⭐ 23,000+

🙏

Source & Thanks

Built by DeepSeek. Weights MIT-licensed.

deepseek-ai/DeepSeek-Coder — ⭐ 23,000+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.