# Marker — Convert PDF to Markdown for AI Tools > High-accuracy PDF to Markdown converter optimized for AI pipelines. Marker handles tables, equations, code blocks, and multi-column layouts with deep learning OCR. ## Install Save as a script file and run: ## Quick Use ```bash pip install marker-pdf ``` ```bash # Convert a single PDF marker_single input.pdf output_dir/ # Convert a directory of PDFs marker output_dir/ --workers 4 ``` ```python from marker.converters.pdf import PdfConverter from marker.models import create_model_dict models = create_model_dict() converter = PdfConverter(artifact_dict=models) rendered = converter("paper.pdf") print(rendered.markdown) ``` ## What is Marker? Marker is a deep learning PDF-to-Markdown converter designed for AI pipelines. It accurately extracts text, tables, equations, code blocks, and images from PDFs — including scanned documents. Unlike rule-based tools, Marker uses trained models for layout detection, OCR, table recognition, and equation conversion, achieving significantly higher accuracy on complex academic and technical documents. **Answer-Ready**: Marker converts PDFs to clean Markdown using deep learning. Handles tables, equations, code blocks, multi-column layouts, and scanned documents. 10x faster than similar tools, 90%+ accuracy on academic papers. Used in RAG pipelines for document ingestion. 19k+ GitHub stars. **Best for**: AI teams building RAG pipelines or processing technical PDFs. **Works with**: Any LLM framework, LangChain, LlamaIndex. **Setup time**: Under 3 minutes. ## Core Features ### 1. High-Accuracy Extraction | Element | Accuracy | |---------|----------| | Body text | 95%+ | | Tables | 90%+ | | Equations (LaTeX) | 85%+ | | Code blocks | 90%+ | | Multi-column | 90%+ | ### 2. Batch Processing ```bash # Process 1000 PDFs with 8 workers marker input_dir/ --workers 8 --output_format markdown ``` ### 3. Multiple Output Formats ```bash # Markdown (default) marker_single paper.pdf out/ --output_format markdown # JSON (structured) marker_single paper.pdf out/ --output_format json # HTML marker_single paper.pdf out/ --output_format html ``` ### 4. Language Support Supports 50+ languages with automatic detection. Works especially well on English, Chinese, Japanese, Korean, and European languages. ### 5. GPU Acceleration ```bash # Auto-detects CUDA/MPS # CPU fallback available but slower TORCH_DEVICE=cuda marker_single paper.pdf out/ ``` ## Marker vs Alternatives | Feature | Marker | PyMuPDF | Zerox | Docling | |---------|--------|---------|-------|---------| | Tables | Deep learning | Rule-based | Vision LLM | Deep learning | | Equations | LaTeX output | Text only | Depends on LLM | Limited | | Scanned PDFs | Built-in OCR | No | Yes (via LLM) | Yes | | Speed | Fast (GPU) | Very fast | Slow (API calls) | Moderate | | Cost | Free (local) | Free | API costs | Free | | Accuracy | Very high | Moderate | High | High | ## FAQ **Q: How does it compare to Zerox?** A: Marker runs locally with no API costs and is much faster for batch processing. Zerox uses vision LLMs (GPT-4o) which cost per page but can handle edge cases better. **Q: Does it work on scanned PDFs?** A: Yes, includes built-in OCR using deep learning models. **Q: What hardware do I need?** A: GPU recommended for speed (NVIDIA CUDA or Apple MPS). CPU works but is 5-10x slower. ## Source & Thanks > Created by [VikParuchuri](https://github.com/VikParuchuri). Licensed under GPL-3.0. > > [VikParuchuri/marker](https://github.com/VikParuchuri/marker) — 19k+ stars ## 快速使用 ```bash pip install marker-pdf marker_single input.pdf output_dir/ ``` 一行命令将 PDF 转为高质量 Markdown。 ## 什么是 Marker? Marker 是基于深度学习的 PDF 转 Markdown 工具,精确提取表格、公式、代码块和多栏版式,支持扫描件 OCR。 **一句话总结**:深度学习 PDF 转 Markdown,精确处理表格/公式/代码/多栏,支持 50+ 语言和扫描件,比同类快 10x,19k+ stars。 **适合人群**:构建 RAG 管线或处理技术文档的 AI 团队。 ## 核心功能 ### 1. 高精度提取 表格 90%+,公式转 LaTeX,代码块识别。 ### 2. 批量处理 多进程并行,支持 GPU 加速。 ### 3. 多语言 50+ 语言自动检测,中文表现优秀。 ## 常见问题 **Q: 和 Zerox 比?** A: Marker 本地运行无 API 费用,批量处理快得多。Zerox 用视觉模型按页收费。 **Q: 支持扫描件?** A: 支持,内置深度学习 OCR。 ## 来源与致谢 > [VikParuchuri/marker](https://github.com/VikParuchuri/marker) — 19k+ stars, GPL-3.0 --- Source: https://tokrepo.com/en/workflows/cfafd4e4-d57d-481d-a44c-10f1c8a66cb0 Author: Script Depot