ScriptsApr 8, 2026·3 min read

Marker — Convert PDF to Markdown for AI Tools

High-accuracy PDF to Markdown converter optimized for AI pipelines. Marker handles tables, equations, code blocks, and multi-column layouts with deep learning OCR.

SC
Script Depot · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install marker-pdf
# Convert a single PDF
marker_single input.pdf output_dir/

# Convert a directory of PDFs
marker output_dir/ --workers 4
from marker.converters.pdf import PdfConverter
from marker.models import create_model_dict

models = create_model_dict()
converter = PdfConverter(artifact_dict=models)
rendered = converter("paper.pdf")
print(rendered.markdown)

What is Marker?

Marker is a deep learning PDF-to-Markdown converter designed for AI pipelines. It accurately extracts text, tables, equations, code blocks, and images from PDFs — including scanned documents. Unlike rule-based tools, Marker uses trained models for layout detection, OCR, table recognition, and equation conversion, achieving significantly higher accuracy on complex academic and technical documents.

Answer-Ready: Marker converts PDFs to clean Markdown using deep learning. Handles tables, equations, code blocks, multi-column layouts, and scanned documents. 10x faster than similar tools, 90%+ accuracy on academic papers. Used in RAG pipelines for document ingestion. 19k+ GitHub stars.

Best for: AI teams building RAG pipelines or processing technical PDFs. Works with: Any LLM framework, LangChain, LlamaIndex. Setup time: Under 3 minutes.

Core Features

1. High-Accuracy Extraction

Element Accuracy
Body text 95%+
Tables 90%+
Equations (LaTeX) 85%+
Code blocks 90%+
Multi-column 90%+

2. Batch Processing

# Process 1000 PDFs with 8 workers
marker input_dir/ --workers 8 --output_format markdown

3. Multiple Output Formats

# Markdown (default)
marker_single paper.pdf out/ --output_format markdown

# JSON (structured)
marker_single paper.pdf out/ --output_format json

# HTML
marker_single paper.pdf out/ --output_format html

4. Language Support

Supports 50+ languages with automatic detection. Works especially well on English, Chinese, Japanese, Korean, and European languages.

5. GPU Acceleration

# Auto-detects CUDA/MPS
# CPU fallback available but slower
TORCH_DEVICE=cuda marker_single paper.pdf out/

Marker vs Alternatives

Feature Marker PyMuPDF Zerox Docling
Tables Deep learning Rule-based Vision LLM Deep learning
Equations LaTeX output Text only Depends on LLM Limited
Scanned PDFs Built-in OCR No Yes (via LLM) Yes
Speed Fast (GPU) Very fast Slow (API calls) Moderate
Cost Free (local) Free API costs Free
Accuracy Very high Moderate High High

FAQ

Q: How does it compare to Zerox? A: Marker runs locally with no API costs and is much faster for batch processing. Zerox uses vision LLMs (GPT-4o) which cost per page but can handle edge cases better.

Q: Does it work on scanned PDFs? A: Yes, includes built-in OCR using deep learning models.

Q: What hardware do I need? A: GPU recommended for speed (NVIDIA CUDA or Apple MPS). CPU works but is 5-10x slower.

🙏

Source & Thanks

Created by VikParuchuri. Licensed under GPL-3.0.

VikParuchuri/marker — 19k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets