Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsJun 1, 2026·3 min de lectura

DeepSeek-OCR — High-Accuracy Optical Context Compression

An OCR model and toolkit from DeepSeek AI that extracts text from images and documents with high accuracy, designed for feeding structured content into LLM pipelines.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
DeepSeek-OCR Overview
Comando de instalación directa
npx -y tokrepo@latest install 0f6cbaa9-5df7-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

DeepSeek-OCR is a document OCR toolkit from DeepSeek AI that extracts text from images, scanned PDFs, and photographs with high accuracy. It is designed to produce clean, structured output suitable for feeding into LLM workflows and AI pipelines.

What DeepSeek-OCR Does

  • Extracts text from scanned documents, photographs, and screenshots
  • Handles complex layouts including tables, multi-column text, and mixed content
  • Outputs structured JSON with positional metadata for each text region
  • Supports batch processing of large document collections
  • Provides both a Python API and CLI interface

Architecture Overview

DeepSeek-OCR uses a vision transformer backbone trained on a large corpus of document images. The pipeline first performs layout analysis to segment the page into regions (text blocks, tables, figures, headers), then runs text recognition on each region. A post-processing stage reconstructs reading order and structures the output as clean text or annotated JSON. The model is optimized for CUDA GPUs but also runs on CPU with reduced throughput.

Self-Hosting & Configuration

  • Install via pip: pip install deepseek-ocr
  • Download model weights automatically on first run
  • Configure output format (plain text, JSON, Markdown) via CLI flags
  • Set GPU device ID or force CPU mode with environment variables
  • Batch process directories with deepseek-ocr extract --input ./scans/

Key Features

  • High accuracy on complex document layouts including tables and multi-column text
  • Structured JSON output with bounding boxes and confidence scores
  • Markdown output mode for direct use in LLM prompts
  • Batch processing with parallel execution for large collections
  • Open model weights for self-hosted deployment

Comparison with Similar Tools

  • Tesseract — traditional OCR engine; DeepSeek-OCR uses modern vision transformers for better accuracy on complex layouts
  • Surya — multilingual OCR; DeepSeek-OCR focuses on contextual compression for LLM pipelines
  • PaddleOCR — general-purpose OCR toolkit; DeepSeek-OCR is optimized for document understanding
  • Azure Document Intelligence — cloud OCR service; DeepSeek-OCR is fully self-hosted and open source

FAQ

Q: What languages does it support? A: It supports English, Chinese, and several other major languages. Check the documentation for the full list.

Q: Does it require a GPU? A: A CUDA GPU is recommended for best performance. CPU inference works but is significantly slower.

Q: Can it extract tables as structured data? A: Yes. Tables are detected and output as structured JSON with row and column information.

Q: Is it free for commercial use? A: Check the license in the repository. DeepSeek models typically have specific license terms.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados