How do I install Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu

Introduction

Unlimited-OCR is an open-source document parsing system released by Baidu Research. It introduces a one-shot long-horizon parsing approach that processes entire documents in a single forward pass, handling complex layouts including tables, formulas, figures, and mixed content.

What Unlimited-OCR Does

Parses full documents in a single pass without sliding-window fragmentation
Recognizes text across complex layouts with tables, columns, and figures
Extracts mathematical formulas and renders them as LaTeX
Handles multi-page PDFs with consistent structural understanding
Outputs structured JSON with bounding boxes and reading order

Architecture Overview

Unlimited-OCR uses a vision-language model backbone that processes document images at high resolution in one shot. Rather than breaking pages into patches and stitching results, it maintains global context across the entire page, enabling accurate reading order detection and cross-element relationship understanding.

Self-Hosting & Configuration

Requires Python 3.8+ and PyTorch with CUDA support
Download pretrained model weights from the official release page
GPU with 16 GB VRAM recommended for full-page processing
Configure output format (JSON, Markdown, or plain text) via CLI flags
Batch processing mode available for large document collections

Key Features

One-shot parsing eliminates boundary artifacts from patch-based methods
Long-horizon context captures cross-page references and document structure
High accuracy on academic papers, invoices, and mixed-language documents
Built-in table structure recognition with cell-level extraction
Formula recognition outputs publication-ready LaTeX

Comparison with Similar Tools

Surya — multi-language OCR with strong line detection but no one-shot page understanding
Marker — PDF-to-Markdown converter focused on clean output rather than structural extraction
MinerU — document extraction for AI pipelines with page-level processing
PaddleOCR — Baidu's established OCR toolkit with a modular pipeline approach
Docling — IBM's document parser focused on conversion to standard formats

FAQ

Q: What document formats does Unlimited-OCR support? A: PDF, PNG, JPG, TIFF, and BMP. Multi-page PDFs are processed with cross-page awareness.

Q: How does it handle handwritten text? A: The model is primarily trained on printed text. Handwritten recognition depends on legibility and may require fine-tuning.

Q: Can I run it on CPU? A: Yes, but inference is significantly slower. GPU processing is recommended for production use.

Q: Does it support right-to-left languages like Arabic? A: The model supports major RTL languages, though accuracy may vary compared to Latin-script content.

Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu

Instalación lista para agent

Introduction

What Unlimited-OCR Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

DeepSeek-OCR — High-Accuracy Optical Context Compression

Paperless-ngx — Self-Hosted Document Management with OCR

Xberg — Polyglot Document Intelligence Framework in Rust

Surya — Document OCR for 90+ Languages