# Zerox — Zero-Shot PDF OCR for AI Pipelines > Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training. ## Install Save as a script file and run: ## Quick Use ```bash pip install py-zerox ``` ```python from pyzerox import zerox import asyncio async def main(): result = await zerox( file_path="report.pdf", model="gpt-4o-mini", ) for page in result.pages: print(page.content) asyncio.run(main()) ``` ## What is Zerox? Zerox is a zero-shot PDF OCR tool that uses vision language models instead of traditional OCR engines. It converts each PDF page into an image, sends it to a vision model (GPT-4o, Claude, Gemini), and extracts clean markdown text. No training, no templates, no configuration — it just works on any document layout. **Answer-Ready**: Zerox is zero-shot PDF OCR using vision models. Converts PDF pages to images, extracts clean markdown via GPT-4o or Claude. No training or templates needed. Handles complex layouts, tables, and handwriting. 7k+ GitHub stars. **Best for**: AI teams processing PDFs for RAG or data extraction. **Works with**: OpenAI GPT-4o, Anthropic Claude, Google Gemini. **Setup time**: Under 2 minutes. ## Core Features ### 1. Multiple Model Support ```python # OpenAI result = await zerox(file_path="doc.pdf", model="gpt-4o-mini") # Anthropic Claude result = await zerox(file_path="doc.pdf", model="claude-sonnet-4-20250514") # Google Gemini result = await zerox(file_path="doc.pdf", model="gemini/gemini-2.0-flash") ``` ### 2. Page Selection ```python result = await zerox( file_path="long_report.pdf", model="gpt-4o-mini", select_pages=[1, 3, 5, 10], # Only process specific pages ) ``` ### 3. Node.js SDK ```bash npm install zerox ``` ```javascript const { zerox } = require("zerox"); const result = await zerox({ filePath: "report.pdf", openaiAPIKey: process.env.OPENAI_API_KEY, }); ``` ### 4. Custom Prompts ```python result = await zerox( file_path="invoice.pdf", model="gpt-4o-mini", custom_system_prompt="Extract all line items as a markdown table with columns: Item, Qty, Price, Total.", ) ``` ## Zerox vs Traditional OCR | Feature | Zerox | Tesseract | AWS Textract | |---------|-------|-----------|--------------| | Setup | pip install | System deps | AWS account | | Complex layouts | Excellent | Poor | Good | | Tables | Markdown tables | Raw text | JSON | | Handwriting | Yes | Limited | Yes | | Cost | Per API call | Free | Per page | | Training needed | None | Sometimes | No | ## FAQ **Q: How much does it cost?** A: Depends on the vision model. GPT-4o-mini is ~$0.01/page, Claude is similar. Self-hosted models are free. **Q: Can it handle scanned documents?** A: Yes, that is its primary use case. Vision models can read scanned text, handwriting, and complex layouts. **Q: How does accuracy compare to Tesseract?** A: Significantly better on complex layouts, tables, and mixed content. Tesseract may be better for simple, clean text. ## Source & Thanks > Created by [getomni-ai](https://github.com/getomni-ai). Licensed under MIT. > > [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars ## 快速使用 ```bash pip install py-zerox ``` 用视觉模型做 OCR,零配置提取 PDF 文本。 ## 什么是 Zerox? Zerox 是零样本 PDF OCR 工具,用视觉语言模型替代传统 OCR。将 PDF 页面转为图片,通过 GPT-4o 或 Claude 提取干净的 Markdown。 **一句话总结**:视觉模型 OCR,PDF 转 Markdown,支持 GPT-4o/Claude/Gemini,处理复杂版式和表格,7k+ stars。 **适合人群**:需要处理 PDF 的 RAG 和数据提取团队。 ## 核心功能 ### 1. 多模型支持 GPT-4o、Claude、Gemini 均可。 ### 2. 页面选择 指定处理特定页面。 ### 3. 自定义提示词 定制提取格式。 ## 常见问题 **Q: 多少钱?** A: GPT-4o-mini 约 $0.01/页。 **Q: 能处理扫描件?** A: 这是主要场景,视觉模型可读取扫描文字和手写。 ## 来源与致谢 > [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars, MIT --- Source: https://tokrepo.com/en/workflows/3ac555d9-d75c-4208-ba46-974e4a717234 Author: Script Depot