Key Features
Accurate Conversion
- Tables — Preserved as Markdown tables with alignment
- Images — Extracted and saved as separate files
- Equations — Converted to LaTeX notation
- Code blocks — Detected and formatted with syntax highlighting
- Multi-column — Correctly reads multi-column layouts in order
- Headers/footers — Automatically removed
Performance
- GPU-accelerated — 10x faster with CUDA
- Batch processing — Convert entire directories
- Multi-language — 90+ languages via Surya OCR engine
Output Formats
- Markdown (clean, LLM-ready)
- JSON (structured with metadata)
- HTML
Comparison
| Feature | Marker | PyPDF | pdfplumber |
|---|---|---|---|
| Tables | ✅ | ❌ | ✅ |
| Images | ✅ | ❌ | ❌ |
| Equations | ✅ | ❌ | ❌ |
| Multi-column | ✅ | ❌ | ❌ |
| OCR (scanned) | ✅ | ❌ | ❌ |
| Speed (GPU) | Fast | Fast | Medium |
FAQ
Q: What is Marker? A: A fast, accurate PDF to Markdown converter that handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated with 90+ language support. 33K+ GitHub stars.
Q: Can Marker handle scanned PDFs? A: Yes, it includes OCR via the Surya engine, supporting 90+ languages for both native and scanned PDFs.