Scripts2026年3月31日·1 分钟阅读

Surya — Document OCR for 90+ Languages

Surya is a document OCR toolkit with 19.5K+ GitHub stars. Text recognition in 90+ languages, layout analysis, table detection, reading order, and LaTeX OCR. Benchmarks favorably against cloud OCR serv

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# Install
pip install surya-ocr

# OCR a document image
surya_ocr image.png

# Detect text lines
surya_detect image.png

# Analyze layout (tables, headers, images)
surya_layout image.png

# Table recognition
surya_table image.png

介绍

Surya is a document OCR toolkit that performs text recognition in 90+ languages, benchmarking favorably against cloud OCR services like Google Cloud Vision and AWS Textract. With 19,500+ GitHub stars, Surya provides line-level text detection, page layout analysis (tables, images, headers), reading order detection, table row/column recognition, and LaTeX OCR for mathematical equations. It works on images and PDFs, making it ideal for document processing pipelines in AI applications.

Best for: Developers building document parsing, data extraction, or digitization pipelines Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Languages: 90+ languages for OCR, any language for detection and layout


Key Features

  • 90+ language OCR: Text recognition benchmarked against cloud services
  • Text detection: Line-level detection in any language
  • Layout analysis: Identify tables, images, headers, footers on pages
  • Reading order: Detect correct reading sequence of text blocks
  • Table recognition: Extract row and column structure from tables
  • LaTeX OCR: Recognize mathematical equations and formulas
  • PDF support: Process multi-page PDF documents directly

FAQ

Q: What is Surya? A: Surya is a document OCR toolkit with 19.5K+ GitHub stars that performs text recognition in 90+ languages, layout analysis, table detection, and LaTeX OCR. It benchmarks favorably against cloud services like Google Vision and AWS Textract.

Q: How do I install Surya? A: Run pip install surya-ocr. Then use CLI commands like surya_ocr image.png for text recognition or surya_layout image.png for page layout analysis.

Q: Is Surya free to use? A: Code is GPL licensed. Model weights are free for research, personal use, and startups under $2M funding/revenue. Commercial licensing is available for larger organizations.


🙏

来源与感谢

Created by Vik Paruchuri. Code: GPL, Models: AI Pubs Open Rail-M. VikParuchuri/surya — 19,500+ GitHub stars

相关资产