Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsJul 5, 2026·3 min de lecture

Unlimited-OCR — One-Shot Long-Horizon Document Parsing by Baidu

Open-source OCR system from Baidu that parses complex documents in a single pass with high accuracy across diverse layouts.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Unlimited-OCR Overview
Commande d'installation directe
npx -y tokrepo@latest install 8a06950a-7808-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

Unlimited-OCR is an open-source document parsing system released by Baidu Research. It introduces a one-shot long-horizon parsing approach that processes entire documents in a single forward pass, handling complex layouts including tables, formulas, figures, and mixed content.

What Unlimited-OCR Does

  • Parses full documents in a single pass without sliding-window fragmentation
  • Recognizes text across complex layouts with tables, columns, and figures
  • Extracts mathematical formulas and renders them as LaTeX
  • Handles multi-page PDFs with consistent structural understanding
  • Outputs structured JSON with bounding boxes and reading order

Architecture Overview

Unlimited-OCR uses a vision-language model backbone that processes document images at high resolution in one shot. Rather than breaking pages into patches and stitching results, it maintains global context across the entire page, enabling accurate reading order detection and cross-element relationship understanding.

Self-Hosting & Configuration

  • Requires Python 3.8+ and PyTorch with CUDA support
  • Download pretrained model weights from the official release page
  • GPU with 16 GB VRAM recommended for full-page processing
  • Configure output format (JSON, Markdown, or plain text) via CLI flags
  • Batch processing mode available for large document collections

Key Features

  • One-shot parsing eliminates boundary artifacts from patch-based methods
  • Long-horizon context captures cross-page references and document structure
  • High accuracy on academic papers, invoices, and mixed-language documents
  • Built-in table structure recognition with cell-level extraction
  • Formula recognition outputs publication-ready LaTeX

Comparison with Similar Tools

  • Surya — multi-language OCR with strong line detection but no one-shot page understanding
  • Marker — PDF-to-Markdown converter focused on clean output rather than structural extraction
  • MinerU — document extraction for AI pipelines with page-level processing
  • PaddleOCR — Baidu's established OCR toolkit with a modular pipeline approach
  • Docling — IBM's document parser focused on conversion to standard formats

FAQ

Q: What document formats does Unlimited-OCR support? A: PDF, PNG, JPG, TIFF, and BMP. Multi-page PDFs are processed with cross-page awareness.

Q: How does it handle handwritten text? A: The model is primarily trained on printed text. Handwritten recognition depends on legibility and may require fine-tuning.

Q: Can I run it on CPU? A: Yes, but inference is significantly slower. GPU processing is recommended for production use.

Q: Does it support right-to-left languages like Arabic? A: The model supports major RTL languages, though accuracy may vary compared to Latin-script content.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires