ScriptsMar 31, 2026·2 min read

Marker — Convert PDF to Markdown with High Accuracy

Fast, accurate PDF to Markdown + JSON converter. Handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated. 33K+ GitHub stars.

TO
TokRepo精选 · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install marker-pdf

# Convert a single PDF
marker_single input.pdf output/ --output_format markdown

Or use in Python:

from marker.converters.pdf import PdfConverter
converter = PdfConverter()
result = converter("report.pdf")
print(result.markdown)

Intro

Marker converts PDF files to Markdown and JSON with high accuracy and speed. It correctly handles complex layouts including tables, images, equations, code blocks, multi-column text, headers/footers, and footnotes. GPU-accelerated for fast batch processing. Built on the Surya OCR engine for multi-language support. 33,000+ GitHub stars.

Best for: RAG pipelines, document ingestion, PDF data extraction, knowledge base building Works with: Any LLM pipeline — LangChain, LlamaIndex, Haystack, custom RAG systems


Key Features

Accurate Conversion

  • Tables — Preserved as Markdown tables with alignment
  • Images — Extracted and saved as separate files
  • Equations — Converted to LaTeX notation
  • Code blocks — Detected and formatted with syntax highlighting
  • Multi-column — Correctly reads multi-column layouts in order
  • Headers/footers — Automatically removed

Performance

  • GPU-accelerated — 10x faster with CUDA
  • Batch processing — Convert entire directories
  • Multi-language — 90+ languages via Surya OCR engine

Output Formats

  • Markdown (clean, LLM-ready)
  • JSON (structured with metadata)
  • HTML

Comparison

Feature Marker PyPDF pdfplumber
Tables
Images
Equations
Multi-column
OCR (scanned)
Speed (GPU) Fast Fast Medium

FAQ

Q: What is Marker? A: A fast, accurate PDF to Markdown converter that handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated with 90+ language support. 33K+ GitHub stars.

Q: Can Marker handle scanned PDFs? A: Yes, it includes OCR via the Surya engine, supporting 90+ languages for both native and scanned PDFs.


🙏

Source & Thanks

Created by Datalab. Licensed under GPL-3.0. datalab-to/marker — 33,000+ GitHub stars

Related Assets