Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJul 5, 2026·3 min de lecture

DeepSpec — Full-Stack Speculative Decoding Training and Evaluation by DeepSeek

Open-source codebase from DeepSeek for training, evaluating, and deploying speculative decoding algorithms that accelerate LLM inference.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
DeepSpec Overview
Commande d'installation directe
npx -y tokrepo@latest install 033cfc51-7809-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

DeepSpec is an open-source framework from DeepSeek AI for training and evaluating speculative decoding algorithms. Speculative decoding accelerates LLM inference by using a smaller draft model to predict tokens that a larger verifier model then accepts or rejects in parallel, achieving significant speedups without changing output quality.

What DeepSpec Does

  • Trains draft models optimized for speculative decoding with target LLMs
  • Evaluates acceptance rates and speedup ratios across decoding strategies
  • Benchmarks different speculative decoding algorithms on standard tasks
  • Provides reproducible training pipelines for research and production
  • Supports multiple draft-verifier pairing configurations

Architecture Overview

DeepSpec implements the full speculative decoding pipeline: draft model training with distillation from the target model, tree-based speculative sampling for higher acceptance rates, and a verification step that guarantees output quality matches the target model exactly. The framework is modular, letting researchers swap components to test new algorithms.

Self-Hosting & Configuration

  • Requires Python 3.10+ and PyTorch with CUDA support
  • Configure draft and target model paths in the YAML config
  • Adjust tree width and depth parameters for speed-quality tradeoffs
  • Distributed training supported via DeepSpeed or FSDP
  • Export optimized draft models for deployment with vLLM or TGI

Key Features

  • End-to-end pipeline from draft model training to production deployment
  • Tree-based speculative sampling improves acceptance rates over naive approaches
  • Guaranteed output equivalence with the target model (no quality degradation)
  • Comprehensive benchmarking suite for comparing decoding strategies
  • Integration with popular serving frameworks for production use

Comparison with Similar Tools

  • vLLM — high-throughput serving engine with built-in speculative decoding support
  • SGLang — fast LLM serving with RadixAttention but separate speculative decoding
  • Medusa — parallel decoding heads approach rather than separate draft models
  • TensorRT-LLM — NVIDIA's inference optimization with speculative decoding support
  • llama.cpp — local inference in C++ with basic speculative decoding

FAQ

Q: How much speedup can speculative decoding achieve? A: Typical speedups range from 1.5x to 3x depending on the draft model quality and task characteristics.

Q: Does speculative decoding change the model output? A: No. The verification step guarantees that the output distribution is identical to running the target model alone.

Q: What models can be used as draft models? A: Any smaller model in the same family works. DeepSpec also supports training custom draft models from scratch.

Q: Can I use DeepSpec with open-weight models? A: Yes. It works with any model pair where you have weight access for both draft and target.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires