Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsMay 24, 2026·2 min de lectura

ds4 — DeepSeek Local Inference Engine by antirez

A minimal C-based local inference engine for DeepSeek 4 Flash, optimized for Apple Metal and NVIDIA CUDA, created by the author of Redis.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
ds4 Overview
Comando CLI universal
npx tokrepo install da2cc493-5727-11f1-9bc6-00163e2b0d79

Introduction

ds4 is a focused, minimal inference engine for running DeepSeek 4 Flash models locally. Written in C by Salvatore Sanfilippo (antirez, creator of Redis), it prioritizes simplicity and performance on consumer hardware with Metal and CUDA acceleration.

What ds4 Does

  • Runs DeepSeek 4 Flash models locally on Mac and Linux
  • Leverages Apple Metal for GPU acceleration on macOS
  • Supports NVIDIA CUDA for GPU inference on Linux
  • Loads GGUF-format quantized models for memory efficiency
  • Provides a simple CLI interface for interactive and batch inference

Architecture Overview

ds4 is a single-file C program with minimal dependencies, following the same design philosophy antirez applied to Redis. It implements a streamlined transformer inference loop with KV-cache management, quantized weight loading, and platform-specific GPU kernels for Metal and CUDA. The codebase deliberately avoids frameworks in favor of direct hardware access.

Self-Hosting & Configuration

  • Build from source with make (no package manager needed)
  • Download GGUF model files from Hugging Face or other sources
  • Configure GPU layers and context size via CLI flags
  • Set batch size and thread count for CPU-only inference
  • No Docker required; runs as a single native binary

Key Features

  • Minimal C codebase with zero external dependencies
  • Native Metal acceleration for Apple Silicon Macs
  • CUDA support for NVIDIA GPUs
  • GGUF model format for efficient quantized inference
  • Created by antirez with the same simplicity principles as Redis

Comparison with Similar Tools

  • llama.cpp — broader model support; ds4 is DeepSeek-specialized and simpler
  • Ollama — user-friendly wrapper; ds4 is a raw inference engine
  • vLLM — production server; ds4 targets single-user local inference
  • ExLlamaV2 — Python-based; ds4 is pure C with no runtime dependencies

FAQ

Q: Which models does ds4 support? A: It is specifically designed for DeepSeek 4 Flash models in GGUF format.

Q: Do I need a GPU? A: No. It runs on CPU, but Metal (Mac) or CUDA (NVIDIA) acceleration significantly improves speed.

Q: Who is antirez? A: Salvatore Sanfilippo, the creator of Redis. He is known for writing high-quality, minimal C programs.

Q: How does it compare to llama.cpp? A: ds4 is narrower in scope, targeting only DeepSeek models. This focus allows a simpler codebase and potentially better optimization for that model family.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados