Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsApr 10, 2026·3 min de lectura

Paperless-ngx — Self-Hosted Document Management with OCR

Paperless-ngx is an open-source document management system that scans, OCRs, indexes, and archives all your physical and digital documents for full-text search.

Introducción

Paperless-ngx is a community-supported document management system that transforms your physical and digital documents into a searchable online archive. It automatically OCRs, indexes, tags, and categorizes every document you feed it — making decades of paperwork instantly searchable.

With 38K+ GitHub stars and GPL-3.0 license, Paperless-ngx is the most popular self-hosted DMS, trusted by thousands of users for going paperless with complete privacy and data ownership.

What Paperless-ngx Does

  • Document Ingestion: Drop PDFs, images, or Office docs into a folder — Paperless processes them automatically
  • OCR: Tesseract-powered OCR extracts text from scanned documents and images (100+ languages)
  • Full-Text Search: Search across all document content, not just filenames
  • Auto-Tagging: Machine learning-powered automatic classification with tags, document types, and correspondents
  • Email Consumption: Automatically import documents from email attachments
  • Scanner Integration: Works with any scanner that can output to a folder or email
  • Mobile Scanning: Upload from phone camera or scanning apps
  • File Naming: Automatic, template-based file renaming and organization on disk
  • Multi-user: Role-based access with per-user document visibility

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│  Web UI      │────▶│  Paperless   │────▶│  PostgreSQL  │
│  (Angular)   │     │  Server      │     │  + Redis     │
└──────────────┘     │  (Django)    │     └──────────────┘
                     └──────┬───────┘
                            │
         ┌──────────────────┼──────────────────┐
         │                  │                  │
  ┌──────┴──┐        ┌─────┴───┐       ┌──────┴──┐
  │Consume  │        │Tesseract│       │ Gotenberg│
  │ Folder  │        │  (OCR)  │       │(Convert) │
  └─────────┘        └─────────┘       └──────────┘

Self-Hosting

Docker Compose (Recommended)

services:
  paperless-webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    ports:
      - "8000:8000"
    environment:
      PAPERLESS_REDIS: redis://redis:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: paperless
      PAPERLESS_SECRET_KEY: your-very-long-secret-key
      PAPERLESS_OCR_LANGUAGE: eng+chi_sim
      PAPERLESS_TIME_ZONE: Asia/Shanghai
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - consume:/usr/src/paperless/consume
      - export:/usr/src/paperless/export
    depends_on:
      - db
      - redis
      - gotenberg
      - tika

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless
      POSTGRES_DB: paperless
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine

  gotenberg:
    image: gotenberg/gotenberg:8
    command: gotenberg --chromium-disable-javascript=true

  tika:
    image: apache/tika:latest

volumes:
  data:
  media:
  pgdata:
  consume:
  export:

Workflow

1. Ingest Documents

Methods:
├── Drop files into /consume folder
├── Upload via web UI (drag & drop)
├── Email (IMAP polling)
├── Mobile app (scan & upload)
└── API upload

2. Automatic Processing

Document dropped
  → Detect file type (PDF, image, Office, etc.)
  → Convert to PDF/A if needed (Gotenberg)
  → OCR text extraction (Tesseract)
  → Full-text indexing
  → ML-based auto-tagging
  → Auto-assign correspondent & document typeRename and archive file
  → Thumbnail generation

3. Search & Organize

Search: "invoice 2024 electricity"
  → Results with highlighted matching textFilter by date range, tags, correspondent
  → Sort by relevance, date, or title

Key Features

Auto-Classification

Paperless learns from your manual tagging and starts auto-classifying:

After you tag 10+ electricity bills:
  → New electricity bills auto-tagged as "Bills" + "Electricity"
  → Correspondent auto-set to "Power Company"
  → Document type auto-set to "Invoice"

File Naming Templates

# Template: {created_year}/{correspondent}/{title}
# Result:
2024/
├── Amazon/
│   ├── Order-123-receipt.pdf
│   └── Order-456-receipt.pdf
├── City Power/
│   ├── January-2024-bill.pdf
│   └── February-2024-bill.pdf
└── Insurance Co/
    └── Policy-renewal-2024.pdf

API

# Upload document
curl -X POST http://localhost:8000/api/documents/post_document/ 
  -H "Authorization: Token YOUR_TOKEN" 
  -F "document=@invoice.pdf" 
  -F "tags=1,2" 
  -F "correspondent=3"

# Search documents
curl "http://localhost:8000/api/documents/?query=invoice+2024" 
  -H "Authorization: Token YOUR_TOKEN"

Paperless-ngx vs Alternatives

Feature Paperless-ngx Docspell Mayan EDMS Teedy
Open Source Yes (GPL-3.0) Yes (AGPL) Yes (Apache) Yes (GPL)
GitHub Stars 38K 1.5K 500 2K
OCR Tesseract (100+ lang) Tesseract Tesseract Tesseract
Auto-tagging ML-based Rule-based Manual Tags
Email intake Yes Yes Yes No
Mobile app Community apps No No No
Full-text search Yes (Whoosh) Yes (Solr) Yes Yes

FAQ

Q: Does it support Chinese OCR? A: Yes. Set PAPERLESS_OCR_LANGUAGE=eng+chi_sim to recognize both English and Simplified Chinese. OCR quality depends on the scan quality — 300 DPI or higher works best.

Q: How much storage do I need? A: It depends on document count and size. Each document generates the original file + archived PDF + thumbnail, roughly 1.5–2× the original size. About 1,000 typical documents need 2–5 GB of storage.

Q: Can multiple users share it? A: Yes. It supports multiple users, each with distinct permissions and document visibility. Administrators can set global tags and document types.

Sources & Credits

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados