Scripts2026年5月31日·1 分钟阅读

Chandra — OCR Model for Complex Tables, Forms, and Handwriting

High-accuracy OCR model that handles structured documents with complex tables, nested forms, and handwritten annotations while preserving full layout fidelity.

Agent 就绪

Agent 可直接安装

这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。

Native · 98/100策略:允许
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Chandra
直接安装命令
npx -y tokrepo@latest install 06d6a932-5ca8-11f1-9bc6-00163e2b0d79 --target codex

先 dry-run 确认安装计划,再运行此命令。

Introduction

Chandra is an open-source OCR model built to handle the documents that standard OCR tools struggle with: dense tables with merged cells, multi-column forms, handwritten annotations, and mixed-layout pages. It preserves the full spatial structure of the document, outputting structured data rather than flat text streams.

What Chandra Does

  • Extracts text from complex tables with merged cells, nested headers, and spanning rows
  • Recognizes handwritten text alongside printed content in the same document
  • Preserves document layout including columns, sections, and spatial relationships
  • Outputs structured formats (JSON, Markdown, HTML) that maintain table and form structure
  • Processes scanned PDFs, photographs of documents, and screenshots

Architecture Overview

Chandra uses a vision-language model architecture with a layout-aware encoder that segments the document into regions (text blocks, tables, figures, handwriting) before applying specialized decoders for each region type. The table decoder uses a cell-graph approach that explicitly models row and column relationships, while the handwriting decoder uses an attention-based sequence model trained on diverse writing styles.

Self-Hosting & Configuration

  • Install via pip with Python 3.10+ and PyTorch
  • Download model weights automatically on first run or pre-download for offline use
  • Configure GPU acceleration with CUDA or run on CPU for smaller documents
  • Set output format (JSON, Markdown, HTML) and language preferences
  • Integrate with document processing pipelines via the Python API or CLI

Key Features

  • Table extraction that correctly handles merged cells, multi-line cells, and nested tables
  • Handwriting recognition supporting multiple scripts and writing styles
  • Layout preservation that maintains reading order across complex multi-column pages
  • Batch processing mode for high-throughput document pipelines
  • Language support for documents mixing Latin, CJK, and other scripts

Comparison with Similar Tools

  • Tesseract — general-purpose OCR; Chandra excels at structured document understanding
  • Surya — focused on multilingual text detection; Chandra adds table and form extraction
  • Nougat — specialized for academic papers; Chandra handles any document type
  • Azure/Google Document AI — cloud services; Chandra runs locally with no API costs

FAQ

Q: Does it require a GPU? A: A GPU is recommended for speed but not required. CPU inference works for smaller documents.

Q: What input formats are supported? A: PDF, PNG, JPEG, TIFF, and BMP. Multi-page PDFs are processed page by page.

Q: How does it handle rotated or skewed documents? A: Chandra includes automatic deskewing and rotation correction as a preprocessing step.

Q: Can I fine-tune it on my own document types? A: Yes. The training pipeline supports fine-tuning on custom labeled datasets.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产