Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 24, 2026·2 min de lecture

ds4 — DeepSeek Local Inference Engine by antirez

A minimal C-based local inference engine for DeepSeek 4 Flash, optimized for Apple Metal and NVIDIA CUDA, created by the author of Redis.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
ds4 Overview
Commande CLI universelle
npx tokrepo install da2cc493-5727-11f1-9bc6-00163e2b0d79

Introduction

ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration.

What ds4 Does

  • Runs DeepSeek 4 Flash models locally on Mac and Linux
  • Leverages Apple Metal for GPU acceleration on macOS
  • Supports NVIDIA CUDA for GPU inference on Linux
  • Loads GGUF-format quantized models for memory efficiency
  • Provides a simple CLI interface for interactive and batch inference

Architecture Overview

ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access.

Self-Hosting & Configuration

  • Build from source with make (no package manager needed)
  • Download GGUF model files from Hugging Face or other sources
  • Configure GPU layers and context size via CLI flags
  • Set batch size and thread count for CPU-only inference
  • No Docker required; runs as a single native binary

Key Features

  • Minimal C codebase with zero external dependencies
  • Native Metal acceleration for Apple Silicon Macs
  • CUDA support for NVIDIA GPUs
  • GGUF model format for efficient quantized inference
  • Created by antirez with the same simplicity principles as Redis

Comparison with Similar Tools

  • llama.cpp — broader model support; ds4 is DeepSeek-specialized and simpler
  • Ollama — user-friendly wrapper; ds4 is a raw inference engine
  • vLLM — production server; ds4 targets single-user local inference
  • ExLlamaV2 — Python-based; ds4 is pure C with no runtime dependencies

FAQ

Q: Which models does ds4 support? A: It is specifically designed for DeepSeek 4 Flash models in GGUF format.

Q: Do I need a GPU? A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed.

Q: Who is antirez? A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs.

Q: How does it compare to llama.cpp? A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires