ScriptsApr 8, 2026·2 min read

Zerox — Zero-Shot PDF OCR for AI Pipelines

Extract text from any PDF using vision models as OCR. Zerox converts PDF pages to images then uses GPT-4o or Claude to extract clean markdown without training.

SC
Script Depot · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install py-zerox
from pyzerox import zerox
import asyncio

async def main():
    result = await zerox(
        file_path="report.pdf",
        model="gpt-4o-mini",
    )
    for page in result.pages:
        print(page.content)

asyncio.run(main())

What is Zerox?

Zerox is a zero-shot PDF OCR tool that uses vision language models instead of traditional OCR engines. It converts each PDF page into an image, sends it to a vision model (GPT-4o, Claude, Gemini), and extracts clean markdown text. No training, no templates, no configuration — it just works on any document layout.

Answer-Ready: Zerox is zero-shot PDF OCR using vision models. Converts PDF pages to images, extracts clean markdown via GPT-4o or Claude. No training or templates needed. Handles complex layouts, tables, and handwriting. 7k+ GitHub stars.

Best for: AI teams processing PDFs for RAG or data extraction. Works with: OpenAI GPT-4o, Anthropic Claude, Google Gemini. Setup time: Under 2 minutes.

Core Features

1. Multiple Model Support

# OpenAI
result = await zerox(file_path="doc.pdf", model="gpt-4o-mini")

# Anthropic Claude
result = await zerox(file_path="doc.pdf", model="claude-sonnet-4-20250514")

# Google Gemini
result = await zerox(file_path="doc.pdf", model="gemini/gemini-2.0-flash")

2. Page Selection

result = await zerox(
    file_path="long_report.pdf",
    model="gpt-4o-mini",
    select_pages=[1, 3, 5, 10],  # Only process specific pages
)

3. Node.js SDK

npm install zerox
const { zerox } = require("zerox");
const result = await zerox({
  filePath: "report.pdf",
  openaiAPIKey: process.env.OPENAI_API_KEY,
});

4. Custom Prompts

result = await zerox(
    file_path="invoice.pdf",
    model="gpt-4o-mini",
    custom_system_prompt="Extract all line items as a markdown table with columns: Item, Qty, Price, Total.",
)

Zerox vs Traditional OCR

Feature Zerox Tesseract AWS Textract
Setup pip install System deps AWS account
Complex layouts Excellent Poor Good
Tables Markdown tables Raw text JSON
Handwriting Yes Limited Yes
Cost Per API call Free Per page
Training needed None Sometimes No

FAQ

Q: How much does it cost? A: Depends on the vision model. GPT-4o-mini is ~$0.01/page, Claude is similar. Self-hosted models are free.

Q: Can it handle scanned documents? A: Yes, that is its primary use case. Vision models can read scanned text, handwriting, and complex layouts.

Q: How does accuracy compare to Tesseract? A: Significantly better on complex layouts, tables, and mixed content. Tesseract may be better for simple, clean text.

🙏

Source & Thanks

Created by getomni-ai. Licensed under MIT.

getomni-ai/zerox — 7k+ stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets