ScriptsApr 10, 2026·3 min read

Paperless-ngx — Self-Hosted Document Management with OCR

Paperless-ngx is an open-source document management system that scans, OCRs, indexes, and archives all your physical and digital documents for full-text search.

TL;DR
Paperless-ngx scans, OCRs, and indexes your documents for full-text search, all self-hosted.
§01

What it is

Paperless-ngx is an open-source document management system that digitizes, OCRs, indexes, and archives both physical and digital documents. Drop a scanned document or PDF into the system, and it automatically extracts text via OCR, applies tags, assigns correspondents, and makes everything searchable. It runs self-hosted via Docker.

This tool is for individuals and organizations who want to go paperless without relying on cloud document services. It is also useful for developers building document processing pipelines.

§02

How it saves time or tokens

Paperless-ngx automates the tedious process of organizing documents. The OCR engine extracts text from scanned images, making them searchable without manual data entry. Machine learning-based auto-tagging learns your categorization patterns and applies them to new documents. For AI workflows, the full-text search API provides document retrieval that can feed into RAG systems.

§03

How to use

  1. Deploy Paperless-ngx via Docker Compose.
  2. Configure consumption directories for document ingestion.
  3. Drop documents into the consumption folder.
  4. Search and manage documents through the web UI.
# Clone and start with Docker Compose
git clone https://github.com/paperless-ngx/paperless-ngx.git
cd paperless-ngx/docker/compose

# Copy environment template
cp docker-compose.env.example docker-compose.env

# Start the stack
docker compose up -d

# Create admin user
docker compose exec webserver python3 manage.py createsuperuser

# Access at http://localhost:8000
§04

Example

API usage for document search:

import requests

api_url = 'http://localhost:8000/api'
headers = {'Authorization': 'Token your-api-token'}

# Search documents
response = requests.get(
    f'{api_url}/documents/',
    headers=headers,
    params={'query': 'invoice 2026'}
)

for doc in response.json()['results']:
    print(f"{doc['title']} - {doc['created_date']}")
    print(f"Tags: {doc['tags']}")
§05

Related on TokRepo

§06

Common pitfalls

  • OCR quality depends on scan quality. Ensure documents are scanned at 300+ DPI for reliable text extraction.
  • The initial setup requires Docker and basic Docker Compose knowledge. The configuration file has many options.
  • Storage grows with your document collection. Plan disk space for both originals and generated thumbnails.
  • Auto-tagging requires training data. The ML classifier needs at least 10 documents per tag before it starts making useful suggestions.
  • Paperless-ngx does not handle encrypted PDFs. Decrypt them before ingestion.
  • Review the official documentation before deploying to production to ensure compatibility with your specific environment and requirements.
  • Start with default settings and customize incrementally. Changing too many configuration options at once makes debugging harder.
  • Keep your installation updated to the latest stable version. Security patches and bug fixes are released regularly.

Frequently Asked Questions

What OCR engine does Paperless-ngx use?+

Paperless-ngx uses Tesseract OCR by default, which supports over 100 languages. It also supports alternative OCR backends. The OCR runs automatically on ingested documents.

Can I use Paperless-ngx on a Raspberry Pi?+

Yes. Paperless-ngx has ARM Docker images that run on Raspberry Pi 4 and later. Performance will be slower than on a full server, especially for OCR processing, but it works for personal document management.

Does it support mobile access?+

The web UI is responsive and works on mobile browsers. There are also community-maintained mobile apps for Android and iOS that connect to your Paperless-ngx instance.

How does auto-tagging work?+

Paperless-ngx uses a machine learning classifier that learns from your tagging patterns. After you manually tag enough documents, it suggests tags for new documents. The more you use it, the more accurate suggestions become.

Can I import existing digital documents?+

Yes. Drop PDF, PNG, JPG, TIFF, and other document formats into the consumption directory. Paperless-ngx processes them automatically. You can also use the web UI or API to upload documents directly.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets