Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsApr 10, 2026·3 min de lectura

Paperless-ngx — Self-Hosted Document Management with OCR

Paperless-ngx is an open-source document management system that scans, OCRs, indexes, and archives all your physical and digital documents for full-text search.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 29/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: Established
Entrada
step-1.md
Comando de staging seguro
npx -y tokrepo@latest install de0041a5-34b7-11f1-9bc6-00163e2b0d79 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

TL;DR
Paperless-ngx scans, OCRs, and indexes your documents for full-text search, all self-hosted.
§01

What it is

Paperless-ngx is an open-source document management system that digitizes, OCRs, indexes, and archives both physical and digital documents. Drop a scanned document or PDF into the system, and it automatically extracts text via OCR, applies tags, assigns correspondents, and makes everything searchable. It runs self-hosted via Docker.

This tool is for individuals and organizations who want to go paperless without relying on cloud document services. It is also useful for developers building document processing pipelines.

§02

How it saves time or tokens

Paperless-ngx automates the tedious process of organizing documents. The OCR engine extracts text from scanned images, making them searchable without manual data entry. Machine learning-based auto-tagging learns your categorization patterns and applies them to new documents. For AI workflows, the full-text search API provides document retrieval that can feed into RAG systems.

§03

How to use

  1. Deploy Paperless-ngx via Docker Compose.
  2. Configure consumption directories for document ingestion.
  3. Drop documents into the consumption folder.
  4. Search and manage documents through the web UI.
# Clone and start with Docker Compose
git clone https://github.com/paperless-ngx/paperless-ngx.git
cd paperless-ngx/docker/compose

# Copy environment template
cp docker-compose.env.example docker-compose.env

# Start the stack
docker compose up -d

# Create admin user
docker compose exec webserver python3 manage.py createsuperuser

# Access at http://localhost:8000
§04

Example

API usage for document search:

import requests

api_url = 'http://localhost:8000/api'
headers = {'Authorization': 'Token your-api-token'}

# Search documents
response = requests.get(
    f'{api_url}/documents/',
    headers=headers,
    params={'query': 'invoice 2026'}
)

for doc in response.json()['results']:
    print(f"{doc['title']} - {doc['created_date']}")
    print(f"Tags: {doc['tags']}")
§05

Related on TokRepo

§06

Common pitfalls

  • OCR quality depends on scan quality. Ensure documents are scanned at 300+ DPI for reliable text extraction.
  • The initial setup requires Docker and basic Docker Compose knowledge. The configuration file has many options.
  • Storage grows with your document collection. Plan disk space for both originals and generated thumbnails.
  • Auto-tagging requires training data. The ML classifier needs at least 10 documents per tag before it starts making useful suggestions.
  • Paperless-ngx does not handle encrypted PDFs. Decrypt them before ingestion.
  • Review the official documentation before deploying to production to ensure compatibility with your specific environment and requirements.
  • Start with default settings and customize incrementally. Changing too many configuration options at once makes debugging harder.
  • Keep your installation updated to the latest stable version. Security patches and bug fixes are released regularly.

Preguntas frecuentes

What OCR engine does Paperless-ngx use?+

Paperless-ngx uses Tesseract OCR by default, which supports over 100 languages. It also supports alternative OCR backends. The OCR runs automatically on ingested documents.

Can I use Paperless-ngx on a Raspberry Pi?+

Yes. Paperless-ngx has ARM Docker images that run on Raspberry Pi 4 and later. Performance will be slower than on a full server, especially for OCR processing, but it works for personal document management.

Does it support mobile access?+

The web UI is responsive and works on mobile browsers. There are also community-maintained mobile apps for Android and iOS that connect to your Paperless-ngx instance.

How does auto-tagging work?+

Paperless-ngx uses a machine learning classifier that learns from your tagging patterns. After you manually tag enough documents, it suggests tags for new documents. The more you use it, the more accurate suggestions become.

Can I import existing digital documents?+

Yes. Drop PDF, PNG, JPG, TIFF, and other document formats into the consumption directory. Paperless-ngx processes them automatically. You can also use the web UI or API to upload documents directly.

Referencias (3)

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados