# Paperless-ngx — Self-Hosted Document Management with OCR > Paperless-ngx is an open-source document management system that scans, OCRs, indexes, and archives all your physical and digital documents for full-text search. ## Install Save as a script file and run: ## Quick Use ```bash docker run -d --name paperless -p 8000:8000 -v paperless-data:/usr/src/paperless/data -v paperless-media:/usr/src/paperless/media -v paperless-consume:/usr/src/paperless/consume -e PAPERLESS_SECRET_KEY=your-secret-key ghcr.io/paperless-ngx/paperless-ngx:latest ``` Open `http://localhost:8000` — create admin account, then drop documents into the consume folder. ## Intro **Paperless-ngx** is a community-supported document management system that transforms your physical and digital documents into a searchable online archive. It automatically OCRs, indexes, tags, and categorizes every document you feed it — making decades of paperwork instantly searchable. With 38K+ GitHub stars and GPL-3.0 license, Paperless-ngx is the most popular self-hosted DMS, trusted by thousands of users for going paperless with complete privacy and data ownership. ## What Paperless-ngx Does - **Document Ingestion**: Drop PDFs, images, or Office docs into a folder — Paperless processes them automatically - **OCR**: Tesseract-powered OCR extracts text from scanned documents and images (100+ languages) - **Full-Text Search**: Search across all document content, not just filenames - **Auto-Tagging**: Machine learning-powered automatic classification with tags, document types, and correspondents - **Email Consumption**: Automatically import documents from email attachments - **Scanner Integration**: Works with any scanner that can output to a folder or email - **Mobile Scanning**: Upload from phone camera or scanning apps - **File Naming**: Automatic, template-based file renaming and organization on disk - **Multi-user**: Role-based access with per-user document visibility ## Architecture ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Web UI │────▶│ Paperless │────▶│ PostgreSQL │ │ (Angular) │ │ Server │ │ + Redis │ └──────────────┘ │ (Django) │ └──────────────┘ └──────┬───────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ┌──────┴──┐ ┌─────┴───┐ ┌──────┴──┐ │Consume │ │Tesseract│ │ Gotenberg│ │ Folder │ │ (OCR) │ │(Convert) │ └─────────┘ └─────────┘ └──────────┘ ``` ## Self-Hosting ### Docker Compose (Recommended) ```yaml services: paperless-webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest ports: - "8000:8000" environment: PAPERLESS_REDIS: redis://redis:6379 PAPERLESS_DBHOST: db PAPERLESS_DBNAME: paperless PAPERLESS_DBUSER: paperless PAPERLESS_DBPASS: paperless PAPERLESS_SECRET_KEY: your-very-long-secret-key PAPERLESS_OCR_LANGUAGE: eng+chi_sim PAPERLESS_TIME_ZONE: Asia/Shanghai volumes: - data:/usr/src/paperless/data - media:/usr/src/paperless/media - consume:/usr/src/paperless/consume - export:/usr/src/paperless/export depends_on: - db - redis - gotenberg - tika db: image: postgres:16-alpine environment: POSTGRES_USER: paperless POSTGRES_PASSWORD: paperless POSTGRES_DB: paperless volumes: - pgdata:/var/lib/postgresql/data redis: image: redis:7-alpine gotenberg: image: gotenberg/gotenberg:8 command: gotenberg --chromium-disable-javascript=true tika: image: apache/tika:latest volumes: data: media: pgdata: consume: export: ``` ## Workflow ### 1. Ingest Documents ``` Methods: ├── Drop files into /consume folder ├── Upload via web UI (drag & drop) ├── Email (IMAP polling) ├── Mobile app (scan & upload) └── API upload ``` ### 2. Automatic Processing ``` Document dropped → Detect file type (PDF, image, Office, etc.) → Convert to PDF/A if needed (Gotenberg) → OCR text extraction (Tesseract) → Full-text indexing → ML-based auto-tagging → Auto-assign correspondent & document type → Rename and archive file → Thumbnail generation ``` ### 3. Search & Organize ``` Search: "invoice 2024 electricity" → Results with highlighted matching text → Filter by date range, tags, correspondent → Sort by relevance, date, or title ``` ## Key Features ### Auto-Classification Paperless learns from your manual tagging and starts auto-classifying: ``` After you tag 10+ electricity bills: → New electricity bills auto-tagged as "Bills" + "Electricity" → Correspondent auto-set to "Power Company" → Document type auto-set to "Invoice" ``` ### File Naming Templates ``` # Template: {created_year}/{correspondent}/{title} # Result: 2024/ ├── Amazon/ │ ├── Order-123-receipt.pdf │ └── Order-456-receipt.pdf ├── City Power/ │ ├── January-2024-bill.pdf │ └── February-2024-bill.pdf └── Insurance Co/ └── Policy-renewal-2024.pdf ``` ### API ```bash # Upload document curl -X POST http://localhost:8000/api/documents/post_document/ -H "Authorization: Token YOUR_TOKEN" -F "document=@invoice.pdf" -F "tags=1,2" -F "correspondent=3" # Search documents curl "http://localhost:8000/api/documents/?query=invoice+2024" -H "Authorization: Token YOUR_TOKEN" ``` ## Paperless-ngx vs Alternatives | Feature | Paperless-ngx | Docspell | Mayan EDMS | Teedy | |---------|--------------|----------|------------|-------| | Open Source | Yes (GPL-3.0) | Yes (AGPL) | Yes (Apache) | Yes (GPL) | | GitHub Stars | 38K | 1.5K | 500 | 2K | | OCR | Tesseract (100+ lang) | Tesseract | Tesseract | Tesseract | | Auto-tagging | ML-based | Rule-based | Manual | Tags | | Email intake | Yes | Yes | Yes | No | | Mobile app | Community apps | No | No | No | | Full-text search | Yes (Whoosh) | Yes (Solr) | Yes | Yes | ## FAQ **Q: Does it support Chinese OCR?** A: Yes. Set `PAPERLESS_OCR_LANGUAGE=eng+chi_sim` to recognize both English and Simplified Chinese. OCR quality depends on the scan quality — 300 DPI or higher works best. **Q: How much storage do I need?** A: It depends on document count and size. Each document generates the original file + archived PDF + thumbnail, roughly 1.5–2× the original size. About 1,000 typical documents need 2–5 GB of storage. **Q: Can multiple users share it?** A: Yes. It supports multiple users, each with distinct permissions and document visibility. Administrators can set global tags and document types. ## Sources & Credits - GitHub: [paperless-ngx/paperless-ngx](https://github.com/paperless-ngx/paperless-ngx) — 38K+ ⭐ | GPL-3.0 - Website: [docs.paperless-ngx.com](https://docs.paperless-ngx.com) --- Source: https://tokrepo.com/en/workflows/de0041a5-34b7-11f1-9bc6-00163e2b0d79 Author: Script Depot