# ds4 — DeepSeek Local Inference Engine by antirez > A minimal C-based local inference engine for DeepSeek 4 Flash, optimized for Apple Metal and NVIDIA CUDA, created by the author of Redis. ## Install Save in your project root: # ds4 — DeepSeek Local Inference Engine by antirez ## Quick Use ```bash git clone https://github.com/antirez/ds4 cd ds4 && make ./ds4 --model deepseek-4-flash.gguf --prompt "Explain quicksort" ``` ## Introduction ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration. ## What ds4 Does - Runs DeepSeek 4 Flash models locally on Mac and Linux - Leverages Apple Metal for GPU acceleration on macOS - Supports NVIDIA CUDA for GPU inference on Linux - Loads GGUF-format quantized models for memory efficiency - Provides a simple CLI interface for interactive and batch inference ## Architecture Overview ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access. ## Self-Hosting & Configuration - Build from source with make (no package manager needed) - Download GGUF model files from Hugging Face or other sources - Configure GPU layers and context size via CLI flags - Set batch size and thread count for CPU-only inference - No Docker required; runs as a single native binary ## Key Features - Minimal C codebase with zero external dependencies - Native Metal acceleration for Apple Silicon Macs - CUDA support for NVIDIA GPUs - GGUF model format for efficient quantized inference - Created by antirez with the same simplicity principles as Redis ## Comparison with Similar Tools - **llama.cpp** — broader model support; ds4 is DeepSeek-specialized and simpler - **Ollama** — user-friendly wrapper; ds4 is a raw inference engine - **vLLM** — production server; ds4 targets single-user local inference - **ExLlamaV2** — Python-based; ds4 is pure C with no runtime dependencies ## FAQ **Q: Which models does ds4 support?** A: It is specifically designed for DeepSeek 4 Flash models in GGUF format. **Q: Do I need a GPU?** A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed. **Q: Who is antirez?** A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs. **Q: How does it compare to llama.cpp?** A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family. ## Sources - https://github.com/antirez/ds4 --- Source: https://tokrepo.com/en/workflows/asset-da2cc493 Author: AI Open Source