# Zerox — Zero-Shot PDF OCR for AI Pipelines > Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash pip install py-zerox ``` ```python from pyzerox import zerox import asyncio async def main(): result = await zerox( file_path="report.pdf", model="gpt-4o-mini", ) for page in result.pages: print(page.content) asyncio.run(main()) ``` ## What is Zerox? Zerox is a zero-shot PDF OCR tool that uses vision language models instead of traditional OCR engines. It converts each PDF page into an image, sends it to a vision model (GPT-4o, Claude, Gemini), and extracts clean markdown text. No training, no templates, no configuration — it just works on any document layout. **Answer-Ready**: Zerox is zero-shot PDF OCR using vision models. Converts PDF pages to images, extracts clean markdown via GPT-4o or Claude. No training or templates needed. Handles complex layouts, tables, and handwriting. 7k+ GitHub stars. **Best for**: AI teams processing PDFs for RAG or data extraction. **Works with**: OpenAI GPT-4o, Anthropic Claude, Google Gemini. **Setup time**: Under 2 minutes. ## Core Features ### 1. Multiple Model Support ```python # OpenAI result = await zerox(file_path="doc.pdf", model="gpt-4o-mini") # Anthropic Claude result = await zerox(file_path="doc.pdf", model="claude-sonnet-4-20250514") # Google Gemini result = await zerox(file_path="doc.pdf", model="gemini/gemini-2.0-flash") ``` ### 2. Page Selection ```python result = await zerox( file_path="long_report.pdf", model="gpt-4o-mini", select_pages=[1, 3, 5, 10], # Only process specific pages ) ``` ### 3. Node.js SDK ```bash npm install zerox ``` ```javascript const { zerox } = require("zerox"); const result = await zerox({ filePath: "report.pdf", openaiAPIKey: process.env.OPENAI_API_KEY, }); ``` ### 4. Custom Prompts ```python result = await zerox( file_path="invoice.pdf", model="gpt-4o-mini", custom_system_prompt="Extract all line items as a markdown table with columns: Item, Qty, Price, Total.", ) ``` ## Zerox vs Traditional OCR | Feature | Zerox | Tesseract | AWS Textract | |---------|-------|-----------|--------------| | Setup | pip install | System deps | AWS account | | Complex layouts | Excellent | Poor | Good | | Tables | Markdown tables | Raw text | JSON | | Handwriting | Yes | Limited | Yes | | Cost | Per API call | Free | Per page | | Training needed | None | Sometimes | No | ## FAQ **Q: How much does it cost?** A: Depends on the vision model. GPT-4o-mini is ~$0.01/page, Claude is similar. Self-hosted models are free. **Q: Can it handle scanned documents?** A: Yes, that is its primary use case. Vision models can read scanned text, handwriting, and complex layouts. **Q: How does accuracy compare to Tesseract?** A: Significantly better on complex layouts, tables, and mixed content. Tesseract may be better for simple, clean text. ## Source & Thanks > Created by [getomni-ai](https://github.com/getomni-ai). Licensed under MIT. > > [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars ## Quick Start ```bash pip install py-zerox ``` OCR with vision models — zero-config PDF text extraction. ## What is Zerox? Zerox is a zero-shot PDF OCR tool that replaces traditional OCR with vision-language models. Converts PDF pages to images and uses GPT-4o or Claude to extract clean Markdown. **In one sentence**: Vision model OCR — PDF to Markdown with GPT-4o/Claude/Gemini, handles complex layouts and tables — 7k+ stars. **For**: Teams building RAG pipelines or handling PDFs. ## Core Features ### 1. Multi-Model Support Works with GPT-4o, Claude, and Gemini. ### 2. Page Selection Process specific pages. ### 3. Custom Prompts Customize extraction format. ## FAQ **Q: How much does it cost?** A: About $0.01 per page with GPT-4o-mini. **Q: Does it handle scans?** A: This is its main use case — vision models read scanned text and handwriting. ## Source & Thanks > [getomni-ai/zerox](https://github.com/getomni-ai/zerox) — 7k+ stars, MIT --- Source: https://tokrepo.com/en/workflows/zerox-zero-shot-pdf-ocr-ai-pipelines-3ac555d9 Author: Script Depot