What is ds4 — DeepSeek Local Inference Engine by antirez?

A minimal C-based local inference engine for DeepSeek 4 Flash, optimized for Apple Metal and NVIDIA CUDA, created by the author of Redis.

Is ds4 — DeepSeek Local Inference Engine by antirez free to use?

Yes. ds4 — DeepSeek Local Inference Engine by antirez is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install ds4 — DeepSeek Local Inference Engine by antirez?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ds4 — DeepSeek Local Inference Engine by antirez

Introduction

ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration.

What ds4 Does

Runs DeepSeek 4 Flash models locally on Mac and Linux
Leverages Apple Metal for GPU acceleration on macOS
Supports NVIDIA CUDA for GPU inference on Linux
Loads GGUF-format quantized models for memory efficiency
Provides a simple CLI interface for interactive and batch inference

Architecture Overview

ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access.

Self-Hosting & Configuration

Build from source with make (no package manager needed)
Download GGUF model files from Hugging Face or other sources
Configure GPU layers and context size via CLI flags
Set batch size and thread count for CPU-only inference
No Docker required; runs as a single native binary

Key Features

Minimal C codebase with zero external dependencies
Native Metal acceleration for Apple Silicon Macs
CUDA support for NVIDIA GPUs
GGUF model format for efficient quantized inference
Created by antirez with the same simplicity principles as Redis

Comparison with Similar Tools

llama.cpp — broader model support; ds4 is DeepSeek-specialized and simpler
Ollama — user-friendly wrapper; ds4 is a raw inference engine
vLLM — production server; ds4 targets single-user local inference
ExLlamaV2 — Python-based; ds4 is pure C with no runtime dependencies

FAQ

Q: Which models does ds4 support? A: It is specifically designed for DeepSeek 4 Flash models in GGUF format.

Q: Do I need a GPU? A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed.

Q: Who is antirez? A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs.

Q: How does it compare to llama.cpp? A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family.

Sources

https://github.com/antirez/ds4

ds4 — DeepSeek Local Inference Engine by antirez

This asset can be read and installed directly by agents

Introduction

What ds4 Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

KoboldCpp — Single-File Local LLM Inference Engine

Unsloth — 2x Faster Local LLM Training & Inference

Electric — Postgres Sync Engine for Local-First Apps

PowerInfer — High-Speed Local LLM Inference via Activation Locality