Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 31, 2026·3 min de lectura

Chandra — OCR Model for Complex Tables, Forms, and Handwriting

High-accuracy OCR model that handles structured documents with complex tables, nested forms, and handwritten annotations while preserving full layout fidelity.

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Chandra
Comando de instalación directa
npx -y tokrepo@latest install 06d6a932-5ca8-11f1-9bc6-00163e2b0d79 --target codex

Ejecutar después de confirmar el plan con dry-run.

Introduction

Chandra is an open-source OCR model built to handle the documents that standard OCR tools struggle with: dense tables with merged cells, multi-column forms, handwritten annotations, and mixed-layout pages. It preserves the full spatial structure of the document, outputting structured data rather than flat text streams.

What Chandra Does

  • Extracts text from complex tables with merged cells, nested headers, and spanning rows
  • Recognizes handwritten text alongside printed content in the same document
  • Preserves document layout including columns, sections, and spatial relationships
  • Outputs structured formats (JSON, Markdown, HTML) that maintain table and form structure
  • Processes scanned PDFs, photographs of documents, and screenshots

Architecture Overview

Chandra uses a vision-language model architecture with a layout-aware encoder that segments the document into regions (text blocks, tables, figures, handwriting) before applying specialized decoders for each region type. The table decoder uses a cell-graph approach that explicitly models row and column relationships, while the handwriting decoder uses an attention-based sequence model trained on diverse writing styles.

Self-Hosting & Configuration

  • Install via pip with Python 3.10+ and PyTorch
  • Download model weights automatically on first run or pre-download for offline use
  • Configure GPU acceleration with CUDA or run on CPU for smaller documents
  • Set output format (JSON, Markdown, HTML) and language preferences
  • Integrate with document processing pipelines via the Python API or CLI

Key Features

  • Table extraction that correctly handles merged cells, multi-line cells, and nested tables
  • Handwriting recognition supporting multiple scripts and writing styles
  • Layout preservation that maintains reading order across complex multi-column pages
  • Batch processing mode for high-throughput document pipelines
  • Language support for documents mixing Latin, CJK, and other scripts

Comparison with Similar Tools

  • Tesseract — general-purpose OCR; Chandra excels at structured document understanding
  • Surya — focused on multilingual text detection; Chandra adds table and form extraction
  • Nougat — specialized for academic papers; Chandra handles any document type
  • Azure/Google Document AI — cloud services; Chandra runs locally with no API costs

FAQ

Q: Does it require a GPU? A: A GPU is recommended for speed but not required. CPU inference works for smaller documents.

Q: What input formats are supported? A: PDF, PNG, JPEG, TIFF, and BMP. Multi-page PDFs are processed page by page.

Q: How does it handle rotated or skewed documents? A: Chandra includes automatic deskewing and rotation correction as a preprocessing step.

Q: Can I fine-tune it on my own document types? A: Yes. The training pipeline supports fine-tuning on custom labeled datasets.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados