Introduction
ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration.
What ds4 Does
- Runs DeepSeek 4 Flash models locally on Mac and Linux
- Leverages Apple Metal for GPU acceleration on macOS
- Supports NVIDIA CUDA for GPU inference on Linux
- Loads GGUF-format quantized models for memory efficiency
- Provides a simple CLI interface for interactive and batch inference
Architecture Overview
ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access.
Self-Hosting & Configuration
- Build from source with make (no package manager needed)
- Download GGUF model files from Hugging Face or other sources
- Configure GPU layers and context size via CLI flags
- Set batch size and thread count for CPU-only inference
- No Docker required; runs as a single native binary
Key Features
- Minimal C codebase with zero external dependencies
- Native Metal acceleration for Apple Silicon Macs
- CUDA support for NVIDIA GPUs
- GGUF model format for efficient quantized inference
- Created by antirez with the same simplicity principles as Redis
Comparison with Similar Tools
- llama.cpp — broader model support; ds4 is DeepSeek-specialized and simpler
- Ollama — user-friendly wrapper; ds4 is a raw inference engine
- vLLM — production server; ds4 targets single-user local inference
- ExLlamaV2 — Python-based; ds4 is pure C with no runtime dependencies
FAQ
Q: Which models does ds4 support? A: It is specifically designed for DeepSeek 4 Flash models in GGUF format.
Q: Do I need a GPU? A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed.
Q: Who is antirez? A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs.
Q: How does it compare to llama.cpp? A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family.