# ArchiveBox — Self-Hosted Web Archiving Platform > ArchiveBox is an open-source self-hosted web archiver that saves URLs as local HTML, PDF, screenshots, WARC, and more. Feed it bookmarks, browser history, or RSS feeds and it preserves everything for offline access. ## Install Save as a script file and run: # ArchiveBox — Self-Hosted Web Archiving Platform ## Quick Use ```bash # Docker Compose (recommended) curl -fsSL https://docker-compose.archivebox.io -o docker-compose.yml docker compose up -d # Add URLs to archive docker compose exec archivebox archivebox add "https://example.com" docker compose exec archivebox archivebox add --depth=1 "https://news.ycombinator.com" ``` ## Introduction ArchiveBox preserves web content you care about before it disappears. It takes URLs from bookmarks, browser history, RSS feeds, or plain text and saves them in multiple formats — HTML, PDF, screenshot, WARC — so you always have a local copy, even when the original goes offline. ## What ArchiveBox Does - Archives web pages in multiple formats: HTML, PDF, screenshot, WARC, media, and Git repos - Accepts input from bookmarks exports, browser history, RSS feeds, and plain URL lists - Provides a web UI for browsing, searching, and managing your archive - Extracts and saves embedded media including images, videos, audio, and documents - Schedules automatic archiving of RSS feeds and bookmark sources on a cron interval ## Architecture Overview ArchiveBox is a Python application built on Django with a SQLite database by default. It orchestrates a suite of external tools — wget, Chrome headless, youtube-dl, readability, mercury-parser — to capture pages in multiple formats simultaneously. Each snapshot is stored as a directory of files with a JSON index, making archives portable and tool-independent. ## Self-Hosting & Configuration - Deploy via Docker Compose, pip install, or Homebrew on macOS and Linux - Configure output formats, archiving depth, and tool preferences via ArchiveBox.conf - Set up scheduled imports from RSS feeds, Pinboard, Pocket, or browser bookmark exports - Use SQLite for small archives or PostgreSQL for larger collections - Serve archives publicly or restrict access with Django authentication ## Key Features - Multi-format preservation ensures content survives even if one format fails - Full-text search across all archived page content and metadata - Browser extension and bookmarklet for one-click archiving of any page - Portable archive format — each snapshot is a standalone folder of standard files - Deduplication and incremental archiving to save storage on repeated URLs ## Comparison with Similar Tools - **Wallabag** — Read-it-later app focused on article reading, not full multi-format archiving - **SingleFile** — Browser extension that saves single pages, but lacks batch processing and scheduling - **HTTrack** — Classic website copier for mirroring entire sites, but no PDF/screenshot/WARC support - **Webrecorder/Conifer** — WARC-focused archiving with replay, but requires more technical setup - **Pocket** — Cloud-based bookmarking without self-hosted option or multi-format local storage ## FAQ **Q: How much storage does ArchiveBox use per page?** A: A typical page with all formats enabled uses 5-50 MB. You can disable formats like screenshots or WARC to reduce storage significantly. **Q: Can I archive pages behind login walls?** A: Yes. You can configure browser cookies or use a logged-in Chrome profile to archive authenticated content. **Q: Does ArchiveBox respect robots.txt?** A: By default, ArchiveBox respects robots.txt for wget fetches, but you can override this in configuration for personal archiving purposes. **Q: Can I export my archive to another tool?** A: Yes. Archives are stored as standard files (HTML, PDF, PNG, WARC) in plain directories that any tool or file browser can access directly. ## Sources - https://github.com/ArchiveBox/ArchiveBox - https://archivebox.io --- Source: https://tokrepo.com/en/workflows/358da384-39db-11f1-9bc6-00163e2b0d79 Author: Script Depot