# Marker — Convert PDF to Markdown with High Accuracy

> Fast, accurate PDF to Markdown + JSON converter. Handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated. 33K+ GitHub stars.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install marker-pdf

# Convert a single PDF
marker_single input.pdf output/ --output_format markdown
```

Or use in Python:
```python
from marker.converters.pdf import PdfConverter
converter = PdfConverter()
result = converter("report.pdf")
print(result.markdown)
```

---

## Intro

Marker converts PDF files to Markdown and JSON with high accuracy and speed. It correctly handles complex layouts including tables, images, equations, code blocks, multi-column text, headers/footers, and footnotes. GPU-accelerated for fast batch processing. Built on the Surya OCR engine for multi-language support. 33,000+ GitHub stars.

**Best for**: RAG pipelines, document ingestion, PDF data extraction, knowledge base building
**Works with**: Any LLM pipeline — LangChain, LlamaIndex, Haystack, custom RAG systems

---

## Key Features

### Accurate Conversion
- **Tables** — Preserved as Markdown tables with alignment
- **Images** — Extracted and saved as separate files
- **Equations** — Converted to LaTeX notation
- **Code blocks** — Detected and formatted with syntax highlighting
- **Multi-column** — Correctly reads multi-column layouts in order
- **Headers/footers** — Automatically removed

### Performance
- **GPU-accelerated** — 10x faster with CUDA
- **Batch processing** — Convert entire directories
- **Multi-language** — 90+ languages via Surya OCR engine

### Output Formats
- Markdown (clean, LLM-ready)
- JSON (structured with metadata)
- HTML

### Comparison
| Feature | Marker | PyPDF | pdfplumber |
|---------|--------|-------|------------|
| Tables | ✅ | ❌ | ✅ |
| Images | ✅ | ❌ | ❌ |
| Equations | ✅ | ❌ | ❌ |
| Multi-column | ✅ | ❌ | ❌ |
| OCR (scanned) | ✅ | ❌ | ❌ |
| Speed (GPU) | Fast | Fast | Medium |

---

### FAQ

**Q: What is Marker?**
A: A fast, accurate PDF to Markdown converter that handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated with 90+ language support. 33K+ GitHub stars.

**Q: Can Marker handle scanned PDFs?**
A: Yes, it includes OCR via the Surya engine, supporting 90+ languages for both native and scanned PDFs.

---

## Source & Thanks

> Created by [Datalab](https://github.com/datalab-to). Licensed under GPL-3.0.
> [datalab-to/marker](https://github.com/datalab-to/marker) — 33,000+ GitHub stars

---
Source: https://tokrepo.com/en/workflows/42976daf-a56a-4152-9afb-d5b00d130a08
Author: Script Depot