# Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu

> Open-source OCR system from Baidu that parses complex documents in a single pass with high accuracy across diverse layouts.

## Install

Save in your project root:

# Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu

## Quick Use
```bash
git clone https://github.com/baidu/Unlimited-OCR.git
cd Unlimited-OCR
pip install -r requirements.txt
python run_ocr.py --input document.pdf --output result.json
```

## Introduction
Unlimited-OCR is an open-source document parsing system released by Baidu Research. It introduces a one-shot long-horizon parsing approach that processes entire documents in a single forward pass, handling complex layouts including tables, formulas, figures, and mixed content.

## What Unlimited-OCR Does
- Parses full documents in a single pass without sliding-window fragmentation
- Recognizes text across complex layouts with tables, columns, and figures
- Extracts mathematical formulas and renders them as LaTeX
- Handles multi-page PDFs with consistent structural understanding
- Outputs structured JSON with bounding boxes and reading order

## Architecture Overview
Unlimited-OCR uses a vision-language model backbone that processes document images at high resolution in one shot. Rather than breaking pages into patches and stitching results, it maintains global context across the entire page, enabling accurate reading order detection and cross-element relationship understanding.

## Self-Hosting & Configuration
- Requires Python 3.8+ and PyTorch with CUDA support
- Download pretrained model weights from the official release page
- GPU with 16 GB VRAM recommended for full-page processing
- Configure output format (JSON, Markdown, or plain text) via CLI flags
- Batch processing mode available for large document collections

## Key Features
- One-shot parsing eliminates boundary artifacts from patch-based methods
- Long-horizon context captures cross-page references and document structure
- High accuracy on academic papers, invoices, and mixed-language documents
- Built-in table structure recognition with cell-level extraction
- Formula recognition outputs publication-ready LaTeX

## Comparison with Similar Tools
- **Surya** — multi-language OCR with strong line detection but no one-shot page understanding
- **Marker** — PDF-to-Markdown converter focused on clean output rather than structural extraction
- **MinerU** — document extraction for AI pipelines with page-level processing
- **PaddleOCR** — Baidu's established OCR toolkit with a modular pipeline approach
- **Docling** — IBM's document parser focused on conversion to standard formats

## FAQ
**Q: What document formats does Unlimited-OCR support?**
A: PDF, PNG, JPG, TIFF, and BMP. Multi-page PDFs are processed with cross-page awareness.

**Q: How does it handle handwritten text?**
A: The model is primarily trained on printed text. Handwritten recognition depends on legibility and may require fine-tuning.

**Q: Can I run it on CPU?**
A: Yes, but inference is significantly slower. GPU processing is recommended for production use.

**Q: Does it support right-to-left languages like Arabic?**
A: The model supports major RTL languages, though accuracy may vary compared to Latin-script content.

## Sources
- https://github.com/baidu/Unlimited-OCR
- https://arxiv.org/abs/2505.unlimited-ocr

---
Source: https://tokrepo.com/en/workflows/asset-8a06950a
Author: AI Open Source