# ds4 — DeepSeek Local Inference Engine by antirez

> A minimal C-based local inference engine for DeepSeek 4 Flash, optimized for Apple Metal and NVIDIA CUDA, created by the author of Redis.

## Install

Save in your project root:

# ds4 — DeepSeek Local Inference Engine by antirez

## Quick Use
```bash
git clone https://github.com/antirez/ds4
cd ds4 && make
./ds4 --model deepseek-4-flash.gguf --prompt "Explain quicksort"
```

## Introduction
ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration.

## What ds4 Does
- Runs DeepSeek 4 Flash models locally on Mac and Linux
- Leverages Apple Metal for GPU acceleration on macOS
- Supports NVIDIA CUDA for GPU inference on Linux
- Loads GGUF-format quantized models for memory efficiency
- Provides a simple CLI interface for interactive and batch inference

## Architecture Overview
ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access.

## Self-Hosting & Configuration
- Build from source with make (no package manager needed)
- Download GGUF model files from Hugging Face or other sources
- Configure GPU layers and context size via CLI flags
- Set batch size and thread count for CPU-only inference
- No Docker required; runs as a single native binary

## Key Features
- Minimal C codebase with zero external dependencies
- Native Metal acceleration for Apple Silicon Macs
- CUDA support for NVIDIA GPUs
- GGUF model format for efficient quantized inference
- Created by antirez with the same simplicity principles as Redis

## Comparison with Similar Tools
- **llama.cpp** — broader model support; ds4 is DeepSeek-specialized and simpler
- **Ollama** — user-friendly wrapper; ds4 is a raw inference engine
- **vLLM** — production server; ds4 targets single-user local inference
- **ExLlamaV2** — Python-based; ds4 is pure C with no runtime dependencies

## FAQ
**Q: Which models does ds4 support?**
A: It is specifically designed for DeepSeek 4 Flash models in GGUF format.

**Q: Do I need a GPU?**
A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed.

**Q: Who is antirez?**
A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs.

**Q: How does it compare to llama.cpp?**
A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family.

## Sources
- https://github.com/antirez/ds4


---
Source: https://tokrepo.com/en/workflows/asset-da2cc493
Author: AI Open Source