Introduction
Chandra is an open-source OCR model built to handle the documents that standard OCR tools struggle with: dense tables with merged cells, multi-column forms, handwritten annotations, and mixed-layout pages. It preserves the full spatial structure of the document, outputting structured data rather than flat text streams.
What Chandra Does
- Extracts text from complex tables with merged cells, nested headers, and spanning rows
- Recognizes handwritten text alongside printed content in the same document
- Preserves document layout including columns, sections, and spatial relationships
- Outputs structured formats (JSON, Markdown, HTML) that maintain table and form structure
- Processes scanned PDFs, photographs of documents, and screenshots
Architecture Overview
Chandra uses a vision-language model architecture with a layout-aware encoder that segments the document into regions (text blocks, tables, figures, handwriting) before applying specialized decoders for each region type. The table decoder uses a cell-graph approach that explicitly models row and column relationships, while the handwriting decoder uses an attention-based sequence model trained on diverse writing styles.
Self-Hosting & Configuration
- Install via pip with Python 3.10+ and PyTorch
- Download model weights automatically on first run or pre-download for offline use
- Configure GPU acceleration with CUDA or run on CPU for smaller documents
- Set output format (JSON, Markdown, HTML) and language preferences
- Integrate with document processing pipelines via the Python API or CLI
Key Features
- Table extraction that correctly handles merged cells, multi-line cells, and nested tables
- Handwriting recognition supporting multiple scripts and writing styles
- Layout preservation that maintains reading order across complex multi-column pages
- Batch processing mode for high-throughput document pipelines
- Language support for documents mixing Latin, CJK, and other scripts
Comparison with Similar Tools
- Tesseract — general-purpose OCR; Chandra excels at structured document understanding
- Surya — focused on multilingual text detection; Chandra adds table and form extraction
- Nougat — specialized for academic papers; Chandra handles any document type
- Azure/Google Document AI — cloud services; Chandra runs locally with no API costs
FAQ
Q: Does it require a GPU? A: A GPU is recommended for speed but not required. CPU inference works for smaller documents.
Q: What input formats are supported? A: PDF, PNG, JPEG, TIFF, and BMP. Multi-page PDFs are processed page by page.
Q: How does it handle rotated or skewed documents? A: Chandra includes automatic deskewing and rotation correction as a preprocessing step.
Q: Can I fine-tune it on my own document types? A: Yes. The training pipeline supports fine-tuning on custom labeled datasets.