# DeepSpec — Full-Stack Speculative Decoding Training and Evaluation by DeepSeek > Open-source codebase from DeepSeek for training, evaluating, and deploying speculative decoding algorithms that accelerate LLM inference. ## Install Save in your project root: # DeepSpec — Speculative Decoding Training and Evaluation by DeepSeek ## Quick Use ```bash git clone https://github.com/deepseek-ai/DeepSpec.git cd DeepSpec pip install -e . python train.py --config configs/default.yaml ``` ## Introduction DeepSpec is an open-source framework from DeepSeek AI for training and evaluating speculative decoding algorithms. Speculative decoding accelerates LLM inference by using a smaller draft model to predict tokens that a larger verifier model then accepts or rejects in parallel, achieving significant speedups without changing output quality. ## What DeepSpec Does - Trains draft models optimized for speculative decoding with target LLMs - Evaluates acceptance rates and speedup ratios across decoding strategies - Benchmarks different speculative decoding algorithms on standard tasks - Provides reproducible training pipelines for research and production - Supports multiple draft-verifier pairing configurations ## Architecture Overview DeepSpec implements the full speculative decoding pipeline: draft model training with distillation from the target model, tree-based speculative sampling for higher acceptance rates, and a verification step that guarantees output quality matches the target model exactly. The framework is modular, letting researchers swap components to test new algorithms. ## Self-Hosting & Configuration - Requires Python 3.10+ and PyTorch with CUDA support - Configure draft and target model paths in the YAML config - Adjust tree width and depth parameters for speed-quality tradeoffs - Distributed training supported via DeepSpeed or FSDP - Export optimized draft models for deployment with vLLM or TGI ## Key Features - End-to-end pipeline from draft model training to production deployment - Tree-based speculative sampling improves acceptance rates over naive approaches - Guaranteed output equivalence with the target model (no quality degradation) - Comprehensive benchmarking suite for comparing decoding strategies - Integration with popular serving frameworks for production use ## Comparison with Similar Tools - **vLLM** — high-throughput serving engine with built-in speculative decoding support - **SGLang** — fast LLM serving with RadixAttention but separate speculative decoding - **Medusa** — parallel decoding heads approach rather than separate draft models - **TensorRT-LLM** — NVIDIA's inference optimization with speculative decoding support - **llama.cpp** — local inference in C++ with basic speculative decoding ## FAQ **Q: How much speedup can speculative decoding achieve?** A: Typical speedups range from 1.5x to 3x depending on the draft model quality and task characteristics. **Q: Does speculative decoding change the model output?** A: No. The verification step guarantees that the output distribution is identical to running the target model alone. **Q: What models can be used as draft models?** A: Any smaller model in the same family works. DeepSpec also supports training custom draft models from scratch. **Q: Can I use DeepSpec with open-weight models?** A: Yes. It works with any model pair where you have weight access for both draft and target. ## Sources - https://github.com/deepseek-ai/DeepSpec --- Source: https://tokrepo.com/en/workflows/asset-033cfc51 Author: AI Open Source