What is Zerox?
Zerox is a zero-shot PDF OCR tool that replaces traditional OCR with vision-language models. Converts PDF pages to images and uses GPT-4o or Claude to extract clean Markdown.
In one sentence: Vision model OCR — PDF to Markdown with GPT-4o/Claude/Gemini, handles complex layouts and tables — 7k+ stars.
For: Teams building RAG pipelines or handling PDFs.
Core Features
1. Multi-Model Support
Works with GPT-4o, Claude, and Gemini.
2. Page Selection
Process specific pages.
3. Custom Prompts
Customize extraction format.
FAQ
Q: How much does it cost? A: About $0.01 per page with GPT-4o-mini.
Q: Does it handle scans? A: This is its main use case — vision models read scanned text and handwriting.