Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMar 30, 2026·2 min de lecture

Marker — Convert PDF to Markdown with High Accuracy

Fast, accurate PDF to Markdown + JSON converter. Handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated. 33K+ GitHub stars.

Introduction

Marker converts PDF files to Markdown and JSON with high accuracy and speed. It correctly handles complex layouts including tables, images, equations, code blocks, multi-column text, headers/footers, and footnotes. GPU-accelerated for fast batch processing. Built on the Surya OCR engine for multi-language support. 33,000+ GitHub stars.

Best for: RAG pipelines, document ingestion, PDF data extraction, knowledge base building Works with: Any LLM pipeline — LangChain, LlamaIndex, Haystack, custom RAG systems


Key Features

Accurate Conversion

  • Tables — Preserved as Markdown tables with alignment
  • Images — Extracted and saved as separate files
  • Equations — Converted to LaTeX notation
  • Code blocks — Detected and formatted with syntax highlighting
  • Multi-column — Correctly reads multi-column layouts in order
  • Headers/footers — Automatically removed

Performance

  • GPU-accelerated — 10x faster with CUDA
  • Batch processing — Convert entire directories
  • Multi-language — 90+ languages via Surya OCR engine

Output Formats

  • Markdown (clean, LLM-ready)
  • JSON (structured with metadata)
  • HTML

Comparison

Feature Marker PyPDF pdfplumber
Tables
Images
Equations
Multi-column
OCR (scanned)
Speed (GPU) Fast Fast Medium

FAQ

Q: What is Marker? A: A fast, accurate PDF to Markdown converter that handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated with 90+ language support. 33K+ GitHub stars.

Q: Can Marker handle scanned PDFs? A: Yes, it includes OCR via the Surya engine, supporting 90+ languages for both native and scanned PDFs.


🙏

Source et remerciements

Created by Datalab. Licensed under GPL-3.0. datalab-to/marker — 33,000+ GitHub stars

Discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires